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PREFACE 


The sixth edition of Research in Education has the same goals as the previous 
editions. The book has been written to be used as a research reference or 
as a text in an introductory course in research methods. It is appropriate 
for graduate students enrolled in a research seminar, for those writing a 
thesis or dissertation, or for those who carry on research as a professional 
activity. All professional workers should be familiar with the methods of 
research and the analysis of data. If only as consumers, professionals should 
understand some of the techniques used in identifying problems, forming 
hypotheses, constructing and using data-gathering instruments, designing 
research studies, and employing statistical procedures to analyze data. They 
should also be able to use this information to interpret and critically analyze 
research reports that appear in professional journals and other publica- 
tions. 

No introductory course can be expected to confer research compe- 
tence, nor can any book present all relevant information. Research skill 
and understanding are achieved only through the combination of course- 
work and experience. Graduate students may find it profitable to carry on 
a small-scale study as a way of learning about research. 

This edition expands and clarifies a number of ideas presented in 
previous editions. Additional concepts, procedures, and examples have 
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been added, and a few have been deleted. In particular, the description of 
historical and qualitative research has been expanded and more details 
added. A new chapter (Chapter 6) has been added on single-subject re- 
search. This edition has been written to conform to the guidelines of the 
American Psychological Association's (APA) Publications Manual (3rd ed.). 
The writing style suggested in Chapter H is also in keeping with the APA 
manual. ui IM 

Many of the topics covered in this book may be peripheral to the 
course objectives of some instructors. It is not suggested that all of the 
topics in this book be included in a single course. It is recommended that 
instructors use the topics selectively and in the sequence that they find most 
appropriate. The portion of the book not used in those courses can then 
be used by the student in subsequent courses, to assist in carrying out a 
thesis, and/or as a reference. 

This revision benefited from the comments of the second author's 
students who had used the earlier editions of this text. To them and to the 
anonymous manuscript reviewers, we express our appreciation. We wish 
to acknowledge the cooperation of the University of Illinois at Chicago 
Computer Center, SPSS, Inc., and SAS Institute, Inc. We are indebted to 
Penelope Witte for typing the manuscript. Finally, we are grateful to our 
wives, Solveig Ager Best and Kathleen Cuerdon-Kahn for their encour- 
agement and support. 


THE MEANING 
OF RESEARCH 


THE SEARCH FOR KNOWLEDGE 


Human beings are the unique product of their creation and evolution. In 
contrast to other forms of animal life, their more highly developed nervous 
system has enabled them to develop sounds and symbols (letters and num- 
bers) that make possible the communication and recording of their ques- 
tions, observations, experiences, and ideas. 

It is understandable that their greater curiosity, implemented by their 
control of symbols, would lead them to speculate about the operation of 
the universe, the great forces beyond their own control. Over many cen- 
turies, people began to develop what seemed to be plausible explanations. 
Attributing the forces of nature to the working of supernatural powers, 
they believed that the gods, at their whims, manipulated the sun, stars, 
wind, rain, and lightning. 

The appearance of the medicine man or priest, who claimed special 
channels of communication with the gods, led to the establishment of a 
system of religious authority passed on from one generation to another. 
A rigid tradition developed, and a dogma of nature's processes, explained 
in terms of mysticism and the authority of the priesthood, became firmly 
rooted, retarding further search for truth for centuries. 
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But gradually people began to see that the operations of the forces 
of nature were not as capricious as they had been led to believe. They 
began to observe an orderliness in the universe and certain cause-and- 
effect relationships; they discovered that under certain conditions events 
could be predicted with reasonable accuracy. However, these explanations 
were often rejected if they seemed to conflict with the dogma of religious 
authority. Curious persons who raised questions were often punished and 
even put to death when they persisted in expressing doubts suggested by 
such unorthodox explanations of natural phenomena. 

This reliance on empirical evidence or personal experience challenged 
the sanction of vested authority and represented an important step in the 
direction of scientific inquiry. Such pragmatic observation, however, was 
largely unsystematic and further limited by the lack of an objective method. 
Observers were likely to overgeneralize on the basis of incomplete expe- 
rience or evidence, to ignore complex factors operating simultaneously, or 
to let their feelings and prejudices influence both their observations and 
their conclusions. 

It was only when people began to think systematically about thinking 
itself that the era of logic began. The first systematic approach to reasoning, 
attributed to Aristotle and the Greeks, was the deductive method. The 
categorical syllogism was one model of thinking that prevailed among.early 
philosophers. Syllogistic reasoning established a logical relationship be- 
tween a major premise, a minor premise, and a conclusion. A major premise is 
a self-evident assumption, previously established by metaphysical truth or 
dogma, that concerns a relationship; a minor premise is a particular case 
related to the major premise. Given the logical relationship of these prem- 
ises, the conclusion is inescapable. 

A typical Aristotelian categorical syllogism follows: 


Major Premise .... All men are mortal. 
Minor Premise ... . Socrates is a man. 
Conclusion . . . . Socrates is mortal. 


"This deductive method, moving from the general assumption to the 
specific application, made an important contribution to the development 
of modern problem solving. But it was not fruitful in arriving at new truths. 
The acceptance of incomplete or false major premises that were base 
old dogmas or unreliable authority could only lead to e 
difficulties often resulted from shifting definitions of the t 

Centuries later, Francis Bacon advocated direct obse: 
nomena, arriving at conclusions or g 
of many individual observations. Thi 
specific observations to the generaliz: 
hazards and limitations of deductive 


don 
rror. Semantic 
erms involved. 
rvation of phe- 
eneralizations through the evidence 
s inductive process of moving from 
ation freed logic from some of the 
thinking. Bacon recognized the ob- 
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stacle that the deductive process placed in the way of discovering new truth: 
It started with old dogmas that religious or intellectual authorities had 
already accepted and thus could be expected to arrive at few new truths. 
These impediments to the discovery of truth, which he termed “idols,” 
were exposed in his Novum Organum, written in 1620. 

The following story, attributed to Bacon, expresses his revolt against 
the authority of the written word, an authority that dominated the search 
for truth during the Middle Ages: 


In the year of our Lord, 1432, there arose a grievous quarrel among the 
brethren over the number of teeth in the mouth of a horse. For thirteen days 
the disputation raged without ceasing. All the ancient books and chronicles 
were fetched out, and wonderful and ponderous erudition was made man- 
ifest. At the beginning of the fourteenth day a youthful friar of goodly bearing 
asked his learned superiors for permission to add a word, and straightway, 
to the wonder of the disputants, whose deep wisdom he sorely vexed, he 
beseeched them in a manner coarse and unheard of, to look in the mouth 
of a horse and find answers to their questionings. At this, their dignity being 
grievously hurt, they waxed exceedingly wroth; and joining in a mighty up- 
roar they flew upon him and smote him hip and thigh and cast him out 
forthwith. For, said they, “Surely Satan hath tempted this bold neophyte to 
declare unholy and unheard-of ways of finding truth, contrary to all the 
teachings of the fathers." After many days of grievous strife the dove of peace 
sat on the assembly, and they, as one man, declaring the problem to be an 
everlasting mystery because of a dearth of historical and theological evidence 
thereof, so ordered the same writ down. (Mees, 1934, pp. 13—14) 


The method of inductive reasoning proposed by Bacon, a method 
new to the field of logic but widely used by the scientists of his time, was 
not hampered by false premises, by the inadequacies and ambiguities of 
verbal symbolism, or by the absence of supporting evidence. 

But the inductive method alone did not provide a completely satis- 
factory system for the solution of problems. Random collection of individual 
observations without a unifying concept or focus often obscured investi- 
gations and therefore rarely led to a generalization or theory. Also, the 
same set of observations can lead to different conclusions and support 
different, even opposing theories. 

The deductive method of Aristotle and the inductive method of Bacon 
were fully integrated in the work of Charles Darwin in the nineteenth 
century. During his early career his observations of animal life failed to 
lead to a satisfactory theory of man’s development. The concept of the 
struggle for existence in Thomas Malthus’ Essay on Population intrigued 
Darwin and suggested the assumption that natural selection explains the 
origin of different species of animals. This hypothesis provided a needed 
focus for his investigations. He proceeded to deduce specific consequences 
suggested by the hypothesis. The evidence gathered confirmed the hy- 
pothesis that biological change in the process of natural selection, in which 
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favorable variations were preserved. and unfavorable ones destroyed, re- 


sulted in the formation of new species. ; 
The major, premise of the older deductive method was gradually 


replaced by an assumption or hypothesis that was subsequently tested by the 


collection and logical analysis of; data. This deductive-inductive method is 
now recognized as an example of a scientific approach. - "s re 

John Dewey (1938) suggested,a pattern that is helpful in identifying 
the elements of a deductive-inductive process: 


A METHOD OF SCIENCE i i 


l. Identification and definition of the problem 

2: Formulation of a hypothesis—an idea as to a probable solution to the 
problem, an intelligent guess or hunch 

3. Collection, organization, and analysis of data 

4. Formulation of conclusions v 

5. Verification, rejection, or modification of the hypothesis by the test 
of its consequences in a specific situation 


Although this pattern is'a useful reconstruction of some methods of 
scientific inquiry, it is not to bé considered the only scientific method. There 
are many ways of applying logic and observation to problem solving. An 
overly rigid definition of the research, process would omit many ways in 
which researchers go about their tasks. The planning of a study may include 
a great deal of exploratory activity, which is frequently intuitive or spec- 
ulative and, at times, a bit disorderly. Although researchers must eventually 
identify a precise and significant problem, their object may initially be vague 
and poorly defined. They may observe situations that seem to suggest 
certain. possible. cause-and-effect relationships and even gather some pre- 
liminary data to examine for possible relevancy to their vaguely conceived 
problem, Thus, much research begins with the inductive method. At this 
stage, imagination and much speculation are essential to the formulation 
of a clearly defined problem that is susceptible to the research process. 
Many students of research rightly feel that problem identification is one 
of the most difficult and. most crucial steps of the research process. 

Frequently researchers are interested in complex problems, the full 
investigation of which requires a series of studies. This approach is known 
as programmatic research and usually combines the inductive and deductive 
methods in a continuously alternating pattern. ‘The researcher may begin 
with a number of observations from which a hypothesis is derived (inductive 
reasoning). ‘Then the researcher proceeds deductively to. determine the 
consequences that are to be expected if the hypothesis. is true. Data are 
then: collected through the inductive. method to verify, reject, or modify 
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the hypothesis. Based on the findings of this study, the researcher goes on 
to formulate more hypotheses to further investigate the complex problem 
under study. Thus, the researcher is continually moving back and forth 
between the inductive method of observation and data collection, and the 
deductive method of hypothesizing the anticipated consequences to events. 


The term science may be thought of as an approach to the gathering of 
knowledge rather than as a field of subject matter. Science, put simply, 
consists of two primary functions: (1) the development of theory and (2) 
the testing of substantive hypotheses that are deduced from theory. The 
scientist, therefore, is engaged in the use, modification, and/or creation of 
theory. The scientist may emphasize an empirical approach in which data 
collection is the primary method, a rational approach in which logical and 
deductive reasoning are primary, or a combination of these approaches, 
which is most common. Regardless of the emphasis, the scientist begins 
with a set of ideas that direct the effort and with a goal that entails the 
development or testing of theory. t 

By attempting to apply the rigorous, systematic observation and anal- 
ysis used in the physical and biological sciences to areas of social behavior, 
the social sciences have grown and have advanced humanity's knowledge 
of itself. The fields of anthropology, economics, education, political science, 
psychology, and social psychology have become recognized as sciences by 
many authorities. To the extent that these fields are founded on scientific 
methodology, they are sciences. Some reject this concept, still defining 
science in terms of subject matter rather than methodology. Historically 
their position can be readily explained. Since scientific methods were first 
used in the investigation of physical phenomena, tradition has identified 
science with the physical world. Only within the last century has the meth- 
odology of science been applied to the study of various areas of human 
behavior. Since these are newer areas of investigation, their results have 
not achieved the acceptance and status that come with the greater maturity 
and longer tradition of the physical sciences. 

The uniformity of nature is a reasonable assumption in the world of 
physical objects and their characteristics, but in the area of social behavior 
such assumptions are not warranted. Human nature is much more complex 
than the sum of its many discrete elements, even if they could be isolated 
and identified. Because human nature is so complex, it is much more 
difficult to develop sound theories of human behavior than to predict 
occurrences in the physical world. Research on human subjects has nu- 
merous problems. 
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l. No two persons are alike in feelings; drives, or emotions. What may 
be a reasonable prediction for one may be useless for another. 

2. No one person is completely consistent from one moment to another. 
Human behavior is influenced by the interaction of the individual 
with every changing element in his or her environment, often in a 
way that is difficult to predict. 

3. Human beings are influenced by the research process itself. They are 
influenced by the attention that is focused on them when under in- 
vestigation and by the knowledge that their behavior is being observed. 

4. The behavioral sciences have been limited by a lack of adequate def- 
inition, Accurate operational definitions are essential to the devel- 
opment of a sophisticated science. Such traits as intelligence, learning, 
hostility, anxiety, or motivation are not directly observable and are 
generally referred to as "constructs," implying that they are construc- 
tions of the scientist’s imagination. Constructs cannot be seen, heard, 
or felt. They can only be inferred by phenomena such as test scores 
or by observed hostile or aggressive acts, skin responses, pulse rates, 
or persistence at a task. 


But even constructs for which useful descriptive instruments are avail- 
able account for only limited sources of variation; they yield only partial 
definitions. For example; intelligence, as defined by a score on an intelli- 
gence test, is not a satisfactory measure of the type of intelligence that 
individuals are called upon to demonstrate in a variety of situations outside 
a formal academic environment. 

In the physical sciences, many complex constructs have been more 
effectively defined in operational terms. Time is one such construct: Time 
is a function of the motion of the earth in relation to the sun, measured 
by the rotation of a hand on the face of a circular scale in precise units. 
Weight is a construct involving the laws of gravitation, measured by springs, 
torsion devices, levers, or electronic adaptations of these instruments. 

The instruments which measure such constructs are devised so that 
they are consistent, to a maximum degree, with known physical laws and 
forces, and yield valid descriptions in a variety of situations. An interna- 
tional bureau prescribes standards for these devices so that they may pro- 
vide precise operational definitions of the constructs. 

Although the problems of discovering theories of human behavior 
are difficult, they may not be insolvable. Behavioral scientists need to carry 
on their investigations as carefully and rigorously as have physical scientists. 
However, one must not overestimate the exactness of the physical sciences, 
for theoretical speculations and probability estimates are also inherent char- 
acteristics. 

Today we live in a world that has benefited greatly from progress 
made by the biological and physical sciences. Infant mortality is decreasing, 
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and life expectancy continues to increase. Surgery is now performed on 
fetuses in utero to correct such conditions as hydrocephalus. Children born 
prematurely weighing less than 1000 grams (approximately 2 pounds) sur- 
vive and generally thrive. The Salk and Sabin vaccines promise to rid the 
world of poliomyelitis. Many forms of cancer are being conquered by early 
detection and chemotherapy. Improved nutrition, antibiotics, innovative 
surgical techniques, and countless other accomplishments allow us to lead 
longer, healthier lives. Automation and computerization touch every aspect 
of our lives reducing our physical labor and increasing our leisure time. 
The splitting of the atom, space travel, and developments in the field of 
electronics such as the laser, superconductivity, and the silicon chip, prom- 
ise improvements and adventures that are beyond the scope of most peo- 
ple's imagination. All these improvements have resulted from the investi- 
gation of biological and physical sciences. 

However, there is less confidence about the improvement of the non- 
physical aspects of our world. Despite all their marvelous gadgets, there is 
some doubt whether people are happier or more satisfied or whether their 
basic needs are being fulfilled more effectively today than they were a 
century ago. The fear of nuclear plant failures and the uncertainty about 
the safe disposal of nuclear waste is uppermost in the minds of people 
throughout the world. Our apparent inability to solve various social prob- 
lems raises the spector of malnutrition, terrorism, and illiteracy. There is 
great concern that our children are not learning sufficiently to compete in 
our more technologically complex society. Standard scores indicate that 
high school children are less prepared for college today than were their 
parents and older siblings. 

Scientific methods must be applied with greater vigor and imagination 
to the behavioral aspects of our culture. The development of the behavioral 

l sciences and their application to education and other human affairs present 
some of our greatest challenges. 


THE ROLE OF THEORY 


At this stage in the discussion, a statement about theory is appropriate. To 
many people the term theory suggests an ivory tower, something unreal and 
of little practical value. On the contrary, a theory establishes a cause and 
effect relationship between variables with the purpose of explaining and 
predicting phenomena. Those who engage in pure research devote their 
energies to the formulation and reformulation of theories and may not be 
concerned with their practical applications. However, when a theory has 
been established, it may suggest many applications of practical value. John 
Dewey once said that there was nothing more practical than a good theory. 

Theories about the relationship between the position of the earth and 
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other moving celestial bodies were essential to the successful launching and 
return of manned space vehicles. Theories of the behavior of gases were 
essential to the development of refrigeration and air conditioning. Con- 
trolled atomic energy could not have been achieved without the establish- 
ment of theories about the nature of mass and energy and the structure 
of the atom. The real purpose of scientific methods is prediction, the 
discovery of certain theories or generalizations that anticipate future oc- 
currences with maximum probability. 

Piaget’s theory of cognitive development is a good example of a theory 
that has been developed with little or no concern for application. Only one 
of Piagets many books discussed education in any great detail (Piaget, 
1970), and even this book does not deal with the specifics that most teachers 
need. However, innumerable books, chapters, and articles written by fol- 
lowers of Piaget have explicated the usefulness of his theory for teaching 
practices from preschool (e.g., Kamii, 1973; Lavatelli, 1973) to high school 
(e.g., Karplus, et al., 1977; Staver & Gabel, 1979), and even for teaching 
mentally retarded (e.g., Kahn, 1984, 1987) and other handicapped children 
(e.g., Wolinsky, 1970). So although Piagets aim was to understand the 
cognitive structures and functioning of children and adults, his theory has 
been embraced by educators and psychologists who have investigated ways 
in which his theory could be used to improve educational practice. 

But what do we mean by the term theory? A theory is an attempt to 
develop a general explanation for some phenomenon. A theory defines 
nonobservable constructs that are inferred from observable facts and events 
and that are thought to have an effect on the phenomenon under study. 
A theory describes the relationship among key variables for purposes of 
explaining a current state or predicting future occurrences. A theory is 
primarily concerned with explanation and therefore focuses on determin- 
ing cause-effect relationships. 


THE HYPOTHESIS 


Two important functions that hypotheses serve in scientific inquiry are the 
development of theory and the statement of parts of an existing theory in 
testable form. Snow (1973) describes six levels of theory, with the first level 
being hypothesis formation. At this initial level, the theory developer has 
a hunch based on past experience, observations, and/or information gained 
from others. A hypothesis is formulated in such a way that this hunch can 
be tested, Based upon the findings of the subsequent research, the hy- 
pothesis is supported or rejected and more hypotheses are formulated to 
continue the process of building a cohesive theory. 

The more common use of hypotheses is to test whether an existing 
theory can be used to solve a problem, In everyday situations, those who 
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confront problems often propose informal hypotheses that can be tested 
directly. For example, when a lamp fails to light when the switch is turned 
on, several hypotheses come to mind based upon our understanding of 
electricity and our past experiences with lamps: 


The plug is not properly connected to the wall outlet. 

The bulb is burned out. 

The fuse is burned out or the circuit breaker has been tripped. 
There has been a power failure in the neighborhood. 


m ae NO 


Each of these speculations can be tested directly by checking the plug 
connection, substituting a bulb known to be in working condition, inspect- 
ing the fuse or circuit breaker, or by noting whether or not other lights in 
the house or in neighbors’ houses are on. 


The Research Hypothesis 


The research or scientific hypothesis is a formal affirmative statement pre- 
dicting a single research outcome, a tentative explanation of the relationship 
between two or more variables. For the hypothesis to be testable, the var- 
iables must be operationally defined. That is, the researcher specifies what 
operations were conducted, or tests used, to measure each variable. Thus, 
the hypothesis focuses the investigation on a definite target and determines 
what observations, or measures, are to be used. 

A number of years ago the hypothesis was formulated that there is a 
positive causal relationship between cigarette smoking and the incidence 
of coronary heart disease. This hypothesis proposed a tentative explanation 
that led to many studies comparing the incidence of heart disease among 
cigarette smokers and nonsmokers. As a result of these extensive studies, 
the medical profession now generally accepts that this relationship has been 
established. 

In the behavioral sciences, the variables may be abstractions that can- 
not be observed. These variables must be defined operationally by describ- 
ing some samples of actual behavior that are concrete enough to be ob- 
served directly. The relationship between these observable incidents may 
be deduced as consistent or inconsistent with the consequences of the hy- 
pothesis. Thus, the hypothesis may be judged to be probably true or prob- 
ably false. 

For example, one might propose the hypothesis that third-grade chil- 
dren taught the Chisanbop hand-calculating process would learn to per- 
form the basic arithmetic processes more effectively (that is, score higher 
on a specified measure or test of arithmetic processing) than those using 
the conventional method. Children would be randomly assigned in two 
groups, one taught the Chisanbop system (experimental group) and the 
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other using the conventional method (control group). The experiment 
would be carried on for a period of nine months. If the hypothesis were 
true, the experimental group's mean scores on a standardized arithmetic 
achievement test would be significantly higher than those of the control 


group. 


The Null Hypothesis (Hj) 


SAMPLING 


In thc early stage of their study, researchers state an affirmative scientific 
or research hypothesis as a prediction of the outcome that they propose 
to test. Water, at the stage of the statistical analysis of the observed data, 
they restate the hypothesis in negative or null form. For example, the 
previous research hypothesis would be restated: There is no significant dif- 
ference between the arithmetic achievement of students taught the Chis- 
anbop system of hand calculation and those using the conventional method. 

The null hypothesis relates to a statistical method of interpreting 
conclusions about population characteristics that are inferred from the 
variable relationships observed in samples. The null hypothesis asserts that 
observed differences or relationships merely result from chance errors 
inherent in the sampling process. Most hypotheses are the opposite of the 
null hypothesis. If the researcher rejects the null hypothesis, he or she 
accepts the research hypothesis, concluding that the magnitude of the ob- 
served variable relationship is probably too great to attribute to sampling 
error. 

The logic of the use of the null hypothesis, which may be confusing 
to students, is explained in greater detail in the discussions of sampling 
error and the central limit theorem in Chapter 9, 


The primary purpose of research is to discover principles that have uni- 
versal application, but to study a whole population to arrive at generali- 
zations would be impracticable, if not impossible. Some populations are so 
large that their characteristics cannot be measured; before the measure- 
ment could be completed, the populations would have changed. 

Imagine the difficulty of conducting a reading experiment with all 
American fifth-grade children as subjects. The study of a population of 
this size would require the services of thousands of researchers, the éx- 
penditure of millions of dollars, and hundreds of thousands of class hours. 
A Fortunately, the process of sampling makes it possible to draw valid 
inferences or generalizations on the basis of careful observation of variables 
within a relatively small proportion of the population. A measured value 


based upon sample data is a statistic. A population value inferred from a 
Statistic is a parameter. 
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A population is any group of individuals that have one or more char- 
acteristics in common that are of interest to the researcher. The population 
may be all the individuals of a particular type, or a more restricted part of 
that group. All public schoolteachers, all male secondary schoolteachers, 
all elementary schoolteachers, or all Chicago kindergarten teachers may be 
populations. 

A sample is a small proportion of a population selected for observation 
and analysis. By observing thé characteristics of the sample, one can make 
certain inferences about the characteristics of the population from which 
it is drawn. Contrary to some popular opinion, samples are not selected 
haphazardly; they are chosen in a systematically random way, so that chance 
or the operation of probability can be utilized. 


RANDOMNESS 


The concept of randomness has been basic to scientific observation and 
research. It is based on the assumption that, while individual events cannot 
be predicted with accuracy, aggregate events can. For instance, although 
it may not predict with great accuracy an individual’s academic achieve- 
ment, it will predict accurately the average academic performance of a 
group. 

Randomization has two important applications in research: 


1. Selecting a group of individuals for observation who are represent- 
ative of the population about which the researcher wishes to gener- 
alize; or 

2. Equating experimental and control groups in an experiment. Assign- 
ing individuals by random assignment is the best method of providing 
for their equivalence. 


j It is important to note that a random sample is not necessarily an: 
identical representation of the population. Characteristics of successive ran- 
dom samples drawn from the same population may differ to some degree, 
but it is possible to estimate their variation from the population character- 
istics and from each other. The variation, known as sampling error, does not 
suggest that a mistake has been made in the sampling process. Rather, 

- sampling error refers to the chance variations that occur in sampling; with 

| randomization, these variations are predictable and taken into account in 

data analysis techniques. 

| The topic of sampling error is considered in greater detail in Chapter 

| 9 in the discussion of the central limit theorem, the standard error of the 

mean, and the level of significance. 
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The Simple Random Sample 


The individual observations or individuals are chosen in such a way that 
each has an equal chance of being selected, and each choice is independent 
of any other choice. If we wished to draw a sámple of 50 individuals from 
a population of 600 students enrolled in a school, we could place the 600 
names in a container and, blindfolded, draw one name at a time until the 
sample of 50 was selected. This procedure is cumbersome and is rarely 
used. 


Random Numbers r 


A more convenient way of selecting a random sample, or assigning indi- 
viduals to experimental and control groups so that they are equated, is by 
the use of a table of random numbers. Many such tables have been gen- 
erated by computers producing a random sequence of digits. The million 
random digits with 100,000 normal deviates of the Rand Corporation (1965) 
and Fisher and Yates (1963) Statistical tables for biological, agricultural and 
medical research are frequently used. 

When using a table, it is necessary to assign consecutive numbers to 
each member of the population from which the sample is to be selected. 
Then, entering the table at any page, row, or column, the researcher can 
select the sample from 001 to 999, three digits; and from 0001 to 9999, 
four digits. When a duplicated number or a number larger than the pop- 
ulation size is encountered, it is skipped and the process continues until 
the desired sample size is selected. 

„As an illustration, let us assume that a sample of 30 is to be selected 
from a serially numbered population of 835. Using a portion of a table of 
random numbers reproduced here, 30 three-digit numbers are selected by 
reading from left to right. When using the table of random numbers to 
select a sample, one must number the population members serially. Then, 
enter the table at any page, row, or column at random, and the sample 
can be selected by reading to the left, right, up, down, or diagonally. For 
populations up to 99 in number, two digits are selected; from 001 to 999, 
three digits; and from 0001 to 9999, four digits. 

These 30 numbered members of the population comprise the sample. 
If this group were to be divided into two equated groups of 15 each, the 
first 15 could compose one group and the second 15 the other. There are 
many varieties of random assignment, such as assigning the odd numbers 
cde group (1, 3, 5, 7, .. . ) and the even numbers (2, 4, 6, 8,...) to the 
other. 

For those with access to a computer, many packaged computer pro- 
grams include the capability to produce a Factoid tasters table. A le 
program can generate a table of random numbers designed for a particular 
study. As an example, assume that a random sample of 30 is to be selected 
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TABLE 1-1 An Abbreviated Table of Random Numbers 


50393 13330 92982 17442 63378 02050 
09038 31974 22381 24289 72341 61530 
82066 06997 44590 23445 72731 61407 
91340 84979 39117 89344 46694 95596 
THE SAMPLE 
503 426 197 161 590 H3 444 
-981 337 422 530 201 408 669 
333 802 381 820 457 497 
092 050 242 660 273 989 
982 090 297 699 Aer 117 
074 383 234 744 407 -893 


In selecting this sample, eight numbers were deleted. Numbers 931, 982, 897, 913, 939, and 
893 were deleted because they were larger than the population of 835 described. Numbers 
234 and 161 were deleted because they duplicate previous selections. 


from a serially numbered population of 585 (1 to 585). The sample is 
generated on an Apple computer with this program: 


100 HOME 

110 FOR I = 1 to 30 

120 X = INT (RND (1) * 585 + 1) 
130 PRINT X 

140 NEXT I 


This program will then randomly produce 30 numbers ranging from 
a possible 1 to 585. The output looks like this: 


419 549 393 
340 363 428 
432 576 248 
219 134 173 
264 26 126 
49 544 540 
47 134 323 
415 559 167 
385 376 323 
354 554 88 


It is apparent that in order to select a random sample, one must not 
consciously select any particular individual or observation. The size of the 
sample may or may not be significantly related to its adequacy. A large 
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sample, carelessly selected, may be biased and inaccurate, whereas a smaller 
one, carefully selected, may be relatively unbiased and accurate enough to 
make satisfactory inference possible. However, a well-selected large sample 
will be more representative of the population than a well-selected smaller 
sample. This is explained in greater detail in the discussions of sampling 
error and the central limit theorem in Chapter 9. 

In addition to caution in the sampling process, definition of the pop- 
ulation about which inferences are to be made is extremely important. 
When the now defunct Literary Digest drew its sample for the purpose of 
predicting the results of the 1936 presidential election, subjects were chosen 
from the pages of telephone directories and from automobile registration 
lists. The prediction of Alfred Landon's victory over Franklin D. Roosevelt 
proved to be wrong, and a postelection analysis revealed that the population 
for which the prediction was made was not the same population sampled. 
Large numbers of eligible voters did not own automobiles and were not 
telephone subscribers, and consequently were not included in the sample. 
In fact, the resulting sample was systematically biased to overrepresent the 
wealthy and underrepresent the poor and unemployed. 


The Systematic Sample 


If a population can be accurately listed or is finite, a type of systematic 
selection will provide what approximates a random sample. A systematic 
sample consists of the selection of each nth term from a list. For example, 
if a sample of 200 were to be selected from a telephone directory with 
200,000 listings, one would select the first name by selecting a randomly 
selected name from a randomly selected page. Then every thousandth 
name would be selected until the sample of 200 names was complete. If 
the last page were reached before the desired number had been selected, 
the count would continue from the first page of the directory. Systematic 
samples of automobile owners could be selected in similar fashion from a 
state licensing bureau list or file, or a sample of eighth-grade students from 
a school attendance roll. 


, The Stratified Random Sample 


At times it is advisable to subdivide the population into smaller homoge- 
neous groups to get more accurate representation. This method results in 
the stratified random sample. For example, in an income study of wage earners 
in a community, a true sample would approximate the same relative number 
from each socioeconomic level of the whole community. If, in the com- 
munity, the proportion were 15 percent professional workers, 10 percent 
managers, 20 percent skilled workers, and 55 percent unskilled workers, 
the sample should include approximately the same proportions in order 
to be considered representative. Within each subgroup a random selection 
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should be used. Thus, for a sample of 100, the researcher would randomly 
select 15 professional workers from the subpopulation of all professional 
workers in the community, 10 managers from that subpopulation, and so 
on. This process gives the researcher a more representative sample than 
one selected from the entire community, which might be unduly weighted 
by a preponderance of unskilled workers. 

In addition to, or instead of, socioeconomic status, such characteristics 
as age, sex, extent of formal education, racial origin, religious or political 
affiliation, or rural-urban residence might provide a basis for choosing a 
stratified sample. The characteristics of the entire population, together with 
the purposes of the study, must be carefully considered before a stratified 
sample is decided upon. 


The Area or Cluster Sample 


The area or cluster sample is a variation of the simple random sample that 
is particularly appropriate when the population of interest is infinite, when 
a list of the members of the population does not exist, or when the geo- 
graphic distribution of the individuals is widely scattered. Suppose, for the 
purpose of a survey, we wanted to select a sample of all public school 
elementary teachers in the United States. A simple random sample would 
be impracticable. 

From the 50 states a random sample of 20 could be selected. From 
these 20 states, all counties could be listed and a random sample of 80 
counties selected. From the 80 counties, all the school districts could be 
listed and a random sample of 30 school districts selected. It would not be 
difficult to compile a list of all elementary teachers from the 30 school 
districts and to select a random sample of 500 teachers. This successive 
random sampling of states, counties, school districts, and finally, of indi- 
viduals would involve a relatively efficient and inexpensive method of se- 
lecting a sample of individuals. 

This method of sampling is likely to introduce an element of sample 
bias because of the unequal size of some of the subsets selected. Only when 
a simple random sample would be inpracticable is this method recom- 
mended. 


Nonprobability Samples 


Nonprobability samples are those that use whatever subjects are available, 
rather than following a specific subject selection process. Some nonprob- 
ability sampling procedures may produce samples that do not accurately 
reflect the characteristics of a population of interest. Such samples may 
lead to unwarranted generalizations and should not be used if random 
selection is practicable. 


The Meaning of Research 


Educational researchers, because of administrative limitations in ran- 
domly selecting and assigning individuals to experimental and control groups, 
often use available classes as samples. The status of groups may be equated 
by such statistical means as the analysis of covariance (discussed in Chapter 
9). In certain types of descriptive studies, the use of available samples may 
restrict generalizations to similar populations. For example, when a psy- 
chology professor uses students from Introduction to Psychology classes as 
subjects, the professor may sately generalize only to other similar groups 
of psychology students. i 

A sample made up of those who volunteer to participate in a study 
may represent a biased sample. Volunteers are not representative of a total 
population, for volunteering results in a selection of individuals who are 
different and who really represent a population of volunteers. In a sense, 
those who respond to a mailed questionnaire are volunteers and may not 
reflect the characteristics of all who were on the mailing list. It may be 
desirable to send another copy of the instrument to nonrespondents with 
an appeal for their participation. 


Sample Size 


There is usually a trade-off between the desirability of a large sample and 
the feasibility of a small one. The ideal sample is large enough to serve as 
an adequate representation of the population about which the researcher 
wishes to generalize and small enough to be selected economically—in 
terms of subject availability, expense in both time and money, and com- 
plexity of data analysis. There is no fixed number or percentage of subjects 
that determines the size of an adequate sample. It may depend upon the 
nature of the population of interest or the data to be gathered and analyzed. 
A national opinion poll randomly selects a sample of about 1500 subjects 
as a reflection of the opinions of a population of more than 150 million 
United States citizens of voting age, with an error factor from 2 to 3 percent. 

Before the second decade of the twentieth century, statisticians be- 
lieved that samples should be relatively large so that the normal probability 
table could be used to estimate sampling error, explained by the central 
limit theorem. (See Chapter 9 for a discussion of sampling error and stu- 
dent's distribution.) The work of William Sealy Gosset in 1915, in which 
he developed data on the probability distribution of small sample means 
(student's ¢ distribution), led to the effective use of small samples. Gosset's 
contribution made feasible research studies that necessarily had to be lim- 
ited to a small number of subjects. Small-sample research has made a 
significant contribution to statistical analysis of research data, particularly 
in experimental studies. 

It is often stated that samples of 30 or more are to be considered large 
samples and those with fewer than 30, small samples. It is approximately 
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at this sample size of 30 that the magnitude of student's t critical values for 
small samples approach the z critical values of the normal probability table 
for large samples. (See Chapter 9 for a discussion of z and ! critical values.) 

More important than size is the care with which the sample is selected. 
The ideal method is random selection, letting chance or the laws of prob- 
ability determine which members of the population are to be selected. When 
random sampling is employed, whether the sample is large or small, the 
errors of sampling may be estimated, giving researchers an idea of the 
confidence that they may place in their findings. 

In summary, several practical observations about sample size are listed: 


l. The larger the sample, the smaller the magnitude of sampling error. 

2. Survey-type studies probably should have larger samples than needed 
in experimental studies. 

3. When sample groups are to be subdivided into smaller groups to be 
compared, the researcher initially should select large enough samples 
so that the subgroups are of adequate size for his or her purpose. 

4. In mailed questionnaire studies, because the percentage of responses 
may be as low as 20 to 30 percent, a larger initial sample mailing is 
indicated. 

5. Subject availability and cost factors are legitimate considerations in 
determining appropriate sample size. 


WHAT IS RESEARCH? 


How is research related to scientific method? The terms research and scientific 
method are sometimes used synonymously in educational discussions. Al- 
though it is true that the terms have some common elements of meaning, 
a distinction is helpful. 

For the purposes of this discussion, research is considered to be the 
more formal, systematic, and intensive process of carrying on a scientific 
method of analysis. Scientific method in problem solving may be an infor- 
mal application of problem identification, hypothesis formulation, obser- 
vation, analysis, and conclusion. You could reach a conclusion why your 
car wouldn't start or why a fire occurred in an unoccupied house by em- 
ploying a scientific method, but the processes involved probably would not 
be as structured as those of research. Research is a more systematic activity 
that is directed toward discovery and the development of an organized 
body of knowledge. Research may be defined as the systematic and objective analysis 
and recording of controlled observations that may lead to the development of gen- 
eralizations, principles, or theories, resulting in prediction and possibly ultimate 
control of events. 
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l. 


Because definitions of this sort are rather abstract, a summary of some 


of the characteristics of research may help to clarify its spiritand meaning. 


Research is directed toward the solution of a problem. The ultimate 
goal is to discover cause-and-effect relationships between variables, 
though researchers often have to settle for the useful discovery of a 
systematic relationship because the evidence for a cause-and-effect 
relationship is insufficient. 

Research emphasizes the development of generalizations, principles, 
or theories that will be helpful in predicting future occurrences. Re- 
search usually goes beyond the specific objects, groups, or situations 
investigated and infers characteristics of a target population from the 
sample observed. Research is more than information retrieval, the 
simple gathering of information. Although many school research de- 
partments gather and tabulate statistical information that may be use- 
ful in decision making, these activities are not properly termed re- 
search. 


Research is based upon observable experience or empirical evidence. 
Certain interesting questions do not lend themselves to research pro- 
cedures because they cannot be observed. Research rejects revelation 
and dogma as methods of establishing knowledge and accepts only 
what can be verified by observation. 


Research demands accurate observation and description. Researchers 
use quantitative measuring devices, the most precise form of descrip- 
tion. When this is not possible or appropriate, they use qualitative or 
nonquantitative descriptions of their observations. They select or de- 
vise valid data-gathering procedures and, when feasible, employ me- 
chanical, electronic, or psychometric devices to refine observation, 
description, and analysis of data. 


Research involves gathering new data from primary or firsthand sources 
or using existing data for a new purpose. Teachers frequently assign 
a so-called research project that involves writing a paper dealing with 
the life of a prominent person. The students are expected to read a 
number of encyclopedias, books, or periodical references and to syn- 
thesize the information in a written report. This is not research, for 
the data are not new. Merely reorganizing or restating what is already 
known and has already been written, valuable as it may be as a learning 
experience, is not research. It adds nothing to what is known. 


Although research activity may at times be somewhat random and 
unsystematic, it is more often characterized by carefully designed 
procedures that apply rigorous analysis. Although trial and error are 
often involved, research is rarely a blind, shotgun investigation or an 
experiment just to see what happens. 


10. 


ll. 
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Research requires expertise. The researcher knows what is already 
known about the problem and how others have investigated it. He or 
she has searched the related literature carefully and is also thoroughly 
grounded in the terminology, concepts, and technical skills necessary 
ta understand and analyze the data gathered. 

Research strives to be objective and logical, applying every possible 
test to validate the procedures employed, the data collected, and the 
conclusions reached. The researcher attempts to eliminate personal 
bias. There is no attempt to persuade or to prove an emotionally held 
conviction. The emphasis is on testing rather than on proving the 
hypothesis. Although absolute objectivity is as elusive as pure right- 
eousness, the researcher tries to suppress bias and emotion in his or 
her analysis. 

Research involves the quest for answers to unsolved problems, Push- 
ing back the frontiers of ignorance is its goal, and originality is fre- 
quently the quality of a good research project. However, previous 
important studies are deliberately repeated, using identical or similar 
procedures, with different subjects, different settings, and at a dif- 
ferent time. This process is replication, a fusion of the words repetition 
and duplication. Replication is always desirable to confirm or to raise 
questions about the conclusions of a previous study. 


Research is characterized by patient and unhurried activity. It is rarely 
spectacular, and researchers must expect disappointment and dis- 
couragement as they pursue the answers to difficult questions. 


Research is carefully recorded and reported. Each important term is 
defined, limiting factors are recognized, procedures are described in 
detail, references are carefully documented, results are objectively 
recorded, and conclusions are presented with scholarly caution and 
restraint. The written report and accompanying data are made avail- 
able to the scrutiny of associates or other scholars. Any competent 
scholar will have the information necessary to analyze, evaluate, and 
even replicate the study. 

Research sometimes requires courage. The history of science reveals 
that many important discoveries were made in spite of the opposition 
of political and religious authorities. The Polish scientist Copernicus 
(1473-1543) was condemned by church authorities when he an- 
nounced his conclusion concerning the nature of the solar system. 
His theory, in direct conflict with the older Ptolemaic theory, held 
that the sun, not the earth, was the center of the solar system. Co- 
pernicus angered supporters of prevailing religious dogma, who viewed - 
his theory as a denial of the story of creation as described in the book 
of Genesis. Modern researchers in such fields as genetics, sexual be- 
havior, and even business practices have aroused violent criticism from 
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those whose personal convictions, experiences, or observations were 
in conflict with some of the research conclusions. 


The rigorous standards of scientific research are apparent from an 
examination of these characteristics. The research worker should be a schol- 
arly, imaginative person of the highest integrity, who is willing to spend 
long hours painstakingly seeking truth. However, it must be recognized 
that researchers are human beings. The ideals that have been listed are 
probably never completely realized. Like righteousness, they are goals to 
strive for and are not all achieved by every researcher. 

Many people have a superficial concept of research, picturing research 
workers as strange introverted individuals who, shunning the company of 
their fellows, find refuge in their laboratory. There, surrounded by test 
tubes, retorts, beakers, and other gadgets, they carry on their mysterious 
activities. In reality the picture is quite different. Research is not all mys- 
terious, and it is carried on by thousands of quite normal individuals, more 
often in teams than alone, very often in the factory, the school, or the 
community, as well as in the laboratory. Its importance is attested to by the 
tremendous amounts of time, manpower, and money spent on research by 
industry, universities, government agencies, and the professions. The key 
to the culturál development of the Western world has been research, the 
reduction of areas of ignorance by discovering new truths, which in turn 
lead to better predictions, better ways of doing things, and new and better 
products. We recognize the fruits of research: better consumer products, 
better ways of preventing and treating disease, better ways of understand- 
ing the behavior of individuals and groups, and a better understanding of 
the world in which we live. In the field of education, we identify research 
with a better understanding of the individual and a better understanding 
of the teaching-learning process and the conditions under which it is most 
successfully carried on. 


FURPOSES OF RESEARCH 


Fundamental or Basic Research 


To this point we have described research in its more formal aspects. Re- 
search has drawn its pattern and spirit from the physical sciences and has 
represented a rigorous, structured type of. analysis. We have presented the 
goal of research as the development of theories by the discovery of broad 
generalizations or principles. We have employed careful sampling proce- 
dures in order to extend the findings beyond the group or situation studied. 
So far, our discussion has shown little concern for the application of the 
findings to actual problems in areas considered to be the concern of people 
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other than the investigator. This methodology is the approach of basic or 
fundamental research. 

Fundamental research is usually carried on in a laboratory situation, 
sometimes with animals as subjects. This type of research has been primarily 
the activity of psychologists rather than educators. 


Applied Research 


Applied research has most of the characteristics of fundamental research, 
including the use of sampling techniques and the subsequent inferences 
about the target population. However, its purpose is improving a product 
ora process— testing theoretical concepts in actual problem situations. Most 
educational research is applied research, for it attempts to develop gen- 
eralizations about teaching-learning processes and instructional materials. 

Fundamental research in the behavioral sciences may be concerned 
with the development and testing of theories of behavor. Educational re- 
search is concerned with the development and testing of theories of how 
students behave in an educational setting. 


Action Research 


Since the late 1930s the fields of social psychology and education have 
shown great interest in what has been called action research. 1n education 
this movement has had as its goal the involvement of both research specialist 
and classroom teacher in the study and application of research to educa- 
tional problems in a particular classroom setting. 

Action research is focused on immediate application, not on the de- 
velopment of theory or on general application. It has placed its emphasis 
ona problem here and now in a local setting. Its findings are to be evaluated 
in terms of local applicability, not universal validity. Its purpose is to im- 
prove school practices and, at thc same time, to improve those who try to 
improve the practices:.to combine the research processes, habits of thinking, 
ability to work harmoniously with others, and professional spirit. 

1f most classroom teachers are to be involved in research activity, it 
will probably be in the area of action research. Modest studies may be made 
for the purpose of trying to improve local classroom practices. It is not 
likely that many teachers will have the time, resources, or technical back- 
ground to engage in the more formal aspects of research activity. Fun- 
damental research must continue to make its essential contribution to be- 
havioral theory, and applied research to the improvement of educational 
practices. These activities, however, will be primarily the function of re- 
search specialists, many of them subsidized by universities, private and 
government agencies, professional associations, and philanthropic foun- 
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Many observers have deprecated action research as nothing more than 
the application of common sense or good management. But whether or 
not it is worthy of the term research, it does apply scientific thinking and 
methods to real-life problems and represents a great improvement over 
teachers’ subjective judgments and decisions based upon folklore and lim- 
ited personal experiences. 

In concluding this discussion, it is important to realize that research 
may be carried on at various levels of complexity. Respectable research 
studies may be the simple descriptive fact-finding variety that lead to useful 
generalizations. Actually, many of the early studies in the behavioral sci- 
ences were useful in providing needed generalizations about the behavior 
or characteristics of individuals and groups. Subsequent experimental stud- 
ies of a more complex nature needed this groundwork information to 
suggest hypotheses for more precise analysis. For example, descriptive 
studies of the intellectually gifted, carried on since the early 1920s by the 
late Lewis M. Terman and his associates, have provided useful generali- 
zations about the characteristics of this segment of the school population. 
Although these studies did not explain the factors underlying giftedness, 
they did provide many hypotheses to be investigated by more sophisticated 
experimental methods. 


ASSESSMENT, EVALUATION, 
AND DESCRIPTIVE RESEARCH 


The term descriptive research has often been used incorrectly to describe 
three types of investigation that are basically different. Perhaps their su- 
perficial similarities have obscured their differences. Each of them employs 
the process of disciplined inquiry through the gathering and analysis of 
empirical data and each attempts to develop knowledge. To be done com- 
petently, each requires the expertise of the careful and systematic inves- 
tigator. A brief explanation may serve to put each one in proper perspec- 
tive. 

Assessment is a fact-finding activity that describes conditions that exist 
at a particular time. No hypotheses are proposed or tested, no variable 
d are examined, and no recommendations for action are sug- 
gested; 

The national census is a massive assessment type of investigation con- 
ducted by the Bureau of the Census, a division of the United States De- 
partment of Commerce. Every 10 years an enumeration of the population 
is conducted, with data classified by nationality, citizenship, age, sex, race, 
marital status, educational level, regional and community residence, em- 
ployment, economic status, births, deaths, and other characteristics. These 
data provide a valuable basis for social analysis and government action. 
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In education, assessment may be concerned with the determination 
of progress that students have made toward educational goals. The National 
Assessment of Educational Progress (NAEP), originally known as the Com- 
mittee on Assessment of the Progress of Education, has been financed by 
the National Center for Educational Statistics. Since 1969 a nationwide 
testing program has been conducted in such fields as science, mathematics, 
literature, reading, and social studies, in four age groupings, in various 
geographical areas of the country, in communities of various sizes, and in 
particular states, and has reported interesting evidence of the degree to 
which learning goals have or have not been realized. 

Evaluation is concerned with the application of its findings and implies 
some judgment of the effectiveness, social utility, or desirability of a prod- 
uct, process, or program in terms of carefully defined and agreed-upon 
objectives or values. It may involve recommendations for action. It is not 
concerned with generalizations that may be extended to other settings. In 
education, it may seek answers to such questions as: How well is the science 
program developing the competencies that have been agreed upon by the 
faculty curriculum committee? Should the program in vocational agricul- 
ture education be dropped? Are the library facilities adequate? Should the 
reading textbook series currently in use be retained? 

Descriptive research, unlike assessment and evaluation, is concerned 
with all of the following: hypothesis formulation and testing, the analysis 
of the relationships between nonmanipulated variables, and the develop- 
ment of generalization. It is this last characteristic that most distinguishes 
descriptive research from assessment and evaluation. While assessment and 
evaluation studies may include other characteristics of descriptive research, 
only descriptive research, of the three, has as its goal, generalization. Unlike 
the experimental method, in which variables are deliberately arranged and 
manipulated through the intervention of the researcher, in descriptive 
research variables that exist or have already occurred are selected and 
observed. This process is described as ex post facto, explanatory observational, 
or causal-comparative research in Chapter 4. Both descriptive and experi- 
mental methods employ careful sampling procedures so that generaliza- 
tions may be extended to other individuals, groups, times, or settings. 


TYPES OF EDUCATIONAL RESEARCH 


Any attempt to classify types of educational research poses a difficult prob- 
lem. The fact that practically every textbook suggests a different system of 
classification provides convincing evidence that there is no generally ac- 
cepted scheme. 

To systematize a method of presentation, however, some pattern is 
desirable. At the risk of seeming arbitrary, and with a recognition of the 
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SUMMARY 


danger of oversimplification, we suggest a framework that might clarify 
understanding of basic principles of research methodology. It should be 
noted that the system of classification is not important in itself but only has 
value in making the analysis of research processes more comprehensible. 

Actually, all research involves the elements of observation, descrip- 
tion, and the analysis of what happens under certain circumstances. A 
rather simple three-point analysis may be used to classify educational re- 
search. Practically all studies fall under one, or a combination, of these 


types. 


1. Historical research describes what was. The process involves investigat- 
ing, recording, analyzing, and interpreting the events of the past for 
the purpose of discovering generalizations that are helpful in under- 
standing the past and the present, and, to a limited extent, in antic- 
ipating the future. 

2. Descriptive research describes what is, describing, recording, analyzing, 
and interpreting conditions that exist. It involves some type of com- 
parison or contrast and attempts to discover relationships between 
existing nonmanipulated variables. 

3. Experimental research describes what will be when certain variables are 
carefully controlled or manipulated. The focus is on variable rela- 
tionships. As defined here, deliberate manipulation is always a part 
of the experimental method. 


A complete chapter is devoted to each of these types of research, to 
techniques of data gathering, to areas of application, and to methods of 
analysis. 


Human beings’ desire to know more about their world has led them from 
primitive superstition to modern scientific knowledge. From mysticism, 
dogma, and the limitations of unsystematic observation based upon per- 
sonal experience, they have examined the process of thinking itself to 
develop the method of deductive-inductive thinking, which has’become 
the foundation of scientific method, Although first applied as a method of 
the physical sciences, the process of scientific inquiry has also become the 
prevailing method of the behavioral sciences. 

There is no single scientific method, for scientists carry on their in- 
vestigations in a number of ways. However, accuracy of observation and 
the qualities of imagination, creativity, objectivity, and patience are some 
of the common ingredients of all scientific methods. 


EXERCISES 
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The hypothesis is an essential research device that gives a focus to 
the investigation and permits researchers to reach probability conclusions. 
After researchers formulate an affirmative research hypothesis at the outset 
of their project, they restate the hypothesis in negative or null form for 
the purposes of statistical analysis of their observations. This procedure 
facilitates inferring population characteristics from observed variable re- 
lationships as they relate to the error inherent in the sampling process. 

Sampling, a deliberate rather than haphazard method of selecting 
subjects for observation, enables the scientist to infer conclusions about a 
population of interest from the observed characteristics of a relatively small 
number of cases. Simple random, systematic, stratified random, area or 
cluster, and ayailable (nonprobability) samples have been described. Meth- 
ods of determining the size of an appropriate sample are suggested and 
the sources of sample bias are considered. 

Research has been defined as the systematic and objective analysis and 
recording of controlled observations that may lead to the development of generali- 
zations, principles, or theories, resulting in prediction and possibly ultimate control 
of events. The characteristics of research that may help to clarify its spirit 
and meaning have been presented. 

Fundamental or basic research is the formal and systematic process of 
deductive-inductive analysis, leading to the development of theories. Applied 
research adapts the theories, developed through fundamental research, to 
the solution of problems. Action research, which may fail to attain the 
rigorous qualities of fundamental and applied research, attempts to apply 
the spirit of scientific method to the solution of problems in a particular 
setting, without any assumptions about the general application of findings 
beyond the situation studied. 

In this chapter we have established assessment, evaluation, and de- 
scriptive research as three distinct types of investigation, and we have clas- 
sified research as historical, descriptive, or experimental. 

Remember that research is essentially an intellectual and creative ac- 
tivity. The mastery of techniques and processes does not confer research 
competence, though these skills may help the creative problem-solver to 
reach his or her objectives more efficiently. 


1. Construct two syllogisms: 
a. one that is sound 
b. one that is faulty. Indicate the nature of the fallacy. 
2. Illustrate the application of Dewey's steps in problem solving. Choose one of 
the problems listed, or one of your own: 
a. brown patches on your lawn 
b. your car won't start when you leave for home 
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in economical buy on canned peaches i 
a Lue e members of your class failed an examination 
3. Give an example of: 
a. a pure research problem 
b. an applied research problem 
c. an action research problem s 

4. To what extent have religious institutions resisted the claims of science? 

5. Is there necessarily a conflict between the disciplines of the sciences and the 
humanities? dini e o 

or disagree with the following statements: 
= pee Mu spé meg spent p the development of theories, because they 
don't usually work in real situations. f 
b. Science is more properly thought of as a method of problem solving than 
as a field of knowledge. í 
C. Applied research is more important than pure research in contributing to 
human welfare. 

7. How would you select a sample of 40 college students for a morale study from 
a freshman class of 320? 

8. From a metropolitan school district staff directory, you wish to select a sample 
of 300 teachers from a listing of 3800. Discuss several ways that the sample 
could be selected, considering the issues that may be involved. 

9. What are the distinctive characteristics of descriptive research as contrasted 
with: 

a. assessment 
b. evaluation 
C. experimental research 
10. How is the term research sometimes misused in classroom assignments and 
television interviews? 
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SELECTING A PROBLEM 
AND PREPARING 
A RESEARCH PROPOSAL 


One of the rhost difficult phases of the graduate research project is the 
choice of a suitable problem. Beginners are likely to select a problem that 
is much too broad in scope. This may be due to their lack of understanding 
of the nature of research and systematic problem-solving activity. It may 
also be due to their enthusiastic but naive desire to solve an important 
problem quickly and immediately. 

Those who are more experienced know that research is often tedious, 
painfully slow, and rarely spectacular. They realize that the search for truth 
and the solution of important problems take a great deal of time and energy 
and the intensive application of logical thinking. Research makes its con- 
tribution to human welfare by countless small additions to knowledge. The 
researcher has some of the characteristics of the ant, which brings its single 
grain of sand to the anthill. 

Before considering the ways in which problems may be identified, we 
should discuss a few of the characteristics of research and the activities of 
am endeavor than an individual 
activity. Researchers working in groups attack problems in different ways, 
pooling their knowledge and ideas and sharing the results of their efforts. 
Highly publicized discoveries usually result from the cumulative efforts of 
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many, working as teams over long periods of time. They are rarely the 
product of a single individual working in isolation. 

Great discoveries rarely happen by accident. When they do, the re- 
searcher is usually well-grounded and possesses the ability, known as ser- 
endipity, to recognize the significance of these fortunate occurrences. He 
or she is imaginative enough to seize the opportunity presented and to 
carry it through toa fruitful conclusion. Pasteur observed that chance favors 
the prepared mind. 

Researchers are specialists rather than generalists. They employ the 
principle of the rifle rather than the shotgun, analyzing limited aspects of 
broad problems. Critics have complained that much social research consists 
of learning more and more about less and less until the researcher knows 
everything about nothing. This is a clever statement but an exaggeration. 
The opposite statement, equally clever and exaggerated, characterizes much 
ineffective problem solving: learning less and less about more and more 
until one knows nothing about everything. 

There is a danger. however, that research activity may focus upon 
such fragmentary aspects of a problem that it has little relevance to the 
formulation of a general theory. An analysis of the relationship among a 
few isolated factors in a complex situation may seem attractive as a research 
project, but it will make little or no contribution to a body of knowledge. 
Research is more than compiling, counting, and tabulating data. It involves 
deducing the consequences of hypotheses through careful observation and 
the application of rigorous logic. 

It is sometimes important to discover that a generalization is probably 
not true. Beginning researchers frequently associate this type of conclusion 
with a sense of personal failure, for they become emotionally committed 
to their hypotheses. Research, however, is a process of testing rather than 
proving, and it implies an objectivity that lets the data lead where they will. 


THE ACADEMIC RESEARCH PROBLEM 


Academic research projects have been subjected to much criticism, both by 
the academic community and by the general public. The academic research 
project is usually a requirement in partial fulfillment of the requirements 
of a graduate course or for an advanced degree. The initial motivation 
may not be the desire to engage in research but the practical need of 
meeting a requirement. Unfortunately, few such studies make a significant 
contribution to the development or refinement of knowledge or to the 
improvement of practice. The lack of time, financial resources, experience, 
and expertise of the researcher, and the academic hazard of departing 
from a relatively safe, short-range project are, understandably, hindrances 
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to significant contributions. But these projects are often justified on the 
grounds that once students develop some research competency they will 
use their "know-how" to seek solutions to basic problems and will make a 
contribution to the body of knowledge upon which sound practices are 
baséd. Too often this expectation is not realized, for too few students carry 
on further studies. 

Few graduate students in education are full-time students; conse- 
quently they are often victims of the competing demands of teaching, 
supervising student activities, attending meetings, and participating in ad- 
ministrative activity. Many are not on campus while they are writing their 
thesis or dissertation, and they miss the continuing intellectual stimulation 
of the university faculty, discussions with fellow graduate students, the 
ready availability of library resources, and the opportunity of the full-time 
student to absorb the scholarly atmosphere of the university community. 
Thus, most graduate students tend to select narrow, practical problems 
that are closely related to their school experience but are frequently low- 
level investigations, with little relevance to theory. 

Perhaps more significant, master's degree or doctoral studies are car- 
ried on under the direction of an advisor or major professor who is devoting 
his or her own energies to research on a significant problem. The efforts 
of degree candidates thus can be directed toward certain restricted phases 
of the major problem, making possible long-term longitudinal studies. Such 
studies as those by the late Lewis M. Terman at Stanford University of 
gifted children, followed over 50 years, represent the cumulative attack 
that is likely to yield more significant results than the uncoordinated in- 
vestigations of candidates whose efforts lack this unifying direction and 
continuity. 


Levels of Research Projects 


In the light of the varied types and purposes of students projects, choice 
of a problem will depend upon the level at which the research is done. A 
problem appropriate for a beginner in a first course in research is different 
from that selected for the more rigorous requirements of the master's thesis 
or the doctoral dissertation. The first topic will necessarily be a modest one 
which can be carried on by an inexperienced researcher in a limited period 
of time. The emphasis will be placed upon the learning process of the 
beginning researcher rather than on his or her actual contribution to ed- 
ucation. This statement does not imply that the product is unimportant. 
It merely recognizes that, because of the limitations of the first research 
project, the emphasis is on learning how, with the hope that subsequent 
investigations will progressively yield more significant contributions to the 
advancement of knowledge. 


Some students choose a first problem that can be expanded later into 
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a more comprehensive treatment at the level of the master's thesis or the 
doctoral dissertation. The first study thus serves as an exploratory process. 


Sources of Problems 


The choice of a suitable problem is always difficult. Few beginners possess 
real problem awareness, and even the more experienced researcher hesi- 
tates at this step. It is a serious responsibility to commit oneself to a problem 
that will inevitably require much time and energy and that is so academically 
significant. 

What are the most likely sources to which one may go for a suitable 
research problem, or from which one may develop a sense of problem 
awareness? 

Many of the problems confronted in the classroom, the school, or the 
community lend themselves to investigation, and they are perhaps more 
appropriate for the beginning researcher than are problems more remote 
from his own teaching experience. What organizational or management 
procedures are employed? How is learning material presented? To what 
extent does one method yield more effective results than another? How 
do teachers feel about these procedures? How do pupils and parents feel 
about them? What out-of-school activities and influences seem to affect 
students and the teaching-learning process? 

Teachers will discover "acres of diamonds" in their own backyards, 
and an inquisitive and imaginative mind may discover in one of these 
problem areas an interesting and worthwhile research project. 

Technological changes and curricular developments are constantly 
bringing forth new problems and new opportunities for research. Perhaps 
more than ever before, educational innovations are being advocated in 
classroom organization, in teaching materials and procedures, and in the 
application of technical devices and equipment. Such innovations as com- 
puter-assisted instruction, teaching by television, programmed instructiou, 
modified alphabets, new subject matter concepts and approaches, flexible 
scheduling, and team teaching need to be carefully evaluated through the 
research process. 

The graduate academic experience should stimulate the questioning 
attitude toward prevailing practices and effectively promote problem 
awareness. Classroom lectures, class discussions, seminar reports, and out- 
of-class exchanges of ideas with fellow students and professors will suggest 
many stimulating problems to be solved. Students who are fortunate enough 
to have graduate assistantships have a special opportunity to profit from 
the stimulation of close professional relationships with faculty members 
and fellow assistants. 

Reading assignments in textbooks, special assignments, research re- 
ports, and term papers will suggest additional areas of needed research. 
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Research articles often suggest techniques and procedures for the attack 
on other problems. A critical evaluation may reveal faults or defects that 
made published findings inconclusive or misleading. Many research articles 
suggest problems for further investigation that may prove fruitful. 

Consultation with the course instructor, advisor, or major professor 
is helpful. Although the student should not expect research problems to 
be assigned, consultation with a faculty member is desirable. Most students 
feel insecure as they approach the choice of a research problem. They 
wonder if the problem they may have in mind is significant enough, feasible, 
and reasonably free of unknown hazards. To expect the beginner to arrive 
at the advisor's office with a completely acceptable problem is quite un- 
realistic. One of the most important functions of the research advisor is to 
help students clarify their thinking, achieve a sense of focus, and develop 
a manageable problem from one that may be too vague and complex. 

The following list may suggest areas from which research problems 
may be further defined. 


l. Programmed instruction; scrambled texts; teaching machines; com- 
puter-assisted instruction 

2. Television instruction; closed circuit TV 

3. Modified alphabets: Unifon, Initial Teaching Alphabet 

4. Flexible scheduling 

5. Team teaching 

6. Evaluation of learning; reporting to parents 

7. Student regulation/control 

8. Learning styles 

9. Evaluation of learning; practices and philosophies 

10. Homework policies and practices 

ll. Field trips 

12. School buildings and facilities; lighting; space; safety 

13. Extracurricular programs 

14. Student out-of-school activities: employment; recreation; cultural ac- 
tivity; reading; television viewing 

15. Teacher out-of-school activities: employment; political activity; rec- 


reation 
16. 'The open classroom 
17. Linguistics 


18. New approaches to biology/chemistry/physics 


19. Language laboratories: foreign languages; readi 
20. Multiple textbooks MEN T 


21. Independent study programs 


22. 
23. 
24. 
25. 
26. 
27. 
28. 
20. 
30. 


31. 
82. 
33. 
34. 
35. 
36. 
37. 


38. 
39. 
40. 


41, 
42. 
43. 
44. 
45. 
46. 
47. 


48. 
49. 
50. 
51. 
52. 
53. 
54. 
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Advanced placement program 

Audiovisual programs 

Sociometry 

Health services 

Guidance-counseling programs 

Teacher morale: annoyances and satisfactions 

Teacher welfare: salaries; merit rating; retirement; tenure 
Educational organizations: local, state, and national; NEA; AFT 
Inner-city schools; the culturally deprived; Head Start; Upward Bound; 
tutoring 

Preservice education of teachers: student teaching 

Teacher attitudes on a variety of issues, e.g., mainstreaming 
In-service programs 

Racial integration: student; teacher 

Parochial/private school problems; tax credits 

Follow-up of graduates; early school leavers 

Religion and education: released time programs; dismissed time; shared 
time 

Non-school-sponsored social organizations or clubs 

School district reorganization 

Community pressures on the school: academic freedom; controversial 
issues 

Legal liability of teachers 

Cadet teaching; teacher recruitment 

Teaching internship 

Sex education 

Ability grouping: acceleration; retardation/promotion 

Special education: speech therapy; clinical services; social services 
Problems in higher education: selection; prediction of success; grad- 
uate programs 

Work-study programs 

Attribution of success and failure 

Comparison of the effectiveness of two teaching methods/procedures 
Self-image analysis 

Vocational objectives of students 

History of an institution, program, or organization 

Factors associated with the selection of teaching/nursing/social work 
as a career 


/ 
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55. Case studies 

56. Socioeconomic status and academic achievement 

57. Perceptions of administrative leadership 

58. The effect of stress on academic achievement 

59. Minimal competency tests for promotion and/or graduation 
60. Merit pay for teachers ho 

For those students who are not teachers, some of the problem areas 
listed may be appropriate in social agency, hospital, or industrial situations. 

Keep in mind that the above list includes general topics that need a 
great deal of refinement in order to become a researchable problem. The 
student will usually need the help of a faculty member in gradually refining 
the general topic into a useful statement of a research problem. 

In order to take a general topic or problem, such as those just listed, 
and refine it into a researchable problem, the individual needs to define 
certain components of the problem: the population of interest, the situa- 
tion, what part of the issue is to be addressed in the first (or next) study, 
and so forth. 

For example, number 49 deals with the issue of attribution of success 
and failure. To make this a researchable problem requires a good deal of 
narrowing and refinement. One researchable problem that can be derived 
from this broad topic (using the approach referred to in the previous 
paragraph) would ask the question, Will college freshmen who are inter- 
nally focused (those who attribute their successes and failures to themselves) 
do better in their first year of college than those who are externally focused 
(those who attribute their successes and failures to external factors)? An- 
other equally plausible research question from this same topic would be, 
Do learning-disabled adolescents differ from nondisabled adolescents on 
a measure of attribution? As can be seen, a large number of researchable 
problems can be derived from this topic. Only by narrowing the focus (e.g., 
population, situation, measurements, etc.) can a researchable problem be 
derived. 

Once the scope of the topic or problem has been narrowed to make 


ita potentially researchable problem, we can then determine its importance 
and feasibility. 


Evaluating the Problem 


Before the proposed research problem can be considered appropriate, 
several searching questions should be raised. Only when those questions 
are answered in the affirmative can the problem be considered a good one. 


l. Is this the type of problem that can be effectively solved through the 
process of research? Can relevant data be gathered to test the theory 
or find the answer to the question under consideration? 


2. 
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Is the problem significant? Is an important principle involved? Would 
the solution make any difference as far as educational theory or prac- 
tice is concerned? If not, there are undoubtedly more significant prob- 
lems waiting to be investigated. 


Is the problem a new one? Is the answer already available? Ignorance 
of prior studies may lead a student to spend time needlessly on a 
problem already investigated by some other worker. However, al- 
though novelty or originality is an important consideration, the fact 
that a problem has been investigated in the past does not mean that 
it is no longer worthy of study. There are times when it is appropriate 
to replicate (repeat) a study to verify its conclusions or to extend the 
validity of its findings to a different situation or population. For in- 
stance, research with nonhandicapped children might be of great 
importance to replicate with mentally retarded children. Similarly, 
much cross-cultural research consists of replicating research con- 
ducted in one country with samples in another country. Kohlberg's 

(1969) theory of moral reasoning has been shown to be valid in a 

number of countries, thereby supporting the universality of the 

theory. 

Is research on the problem feasible? After a research project has been 

evaluated, there remains the problem of suitability for a particular 

researcher. The student should ask: Although the problem may be a 

good one, is it a good problem for me? Will I be able to carry it 

through to a successful conclusion? Some of the questions the students 
should consider are the following: 

a. Am I competent to plan and carry out a study of this type? Do I 
know enough about this field to understand its significant aspects 
and to interpret my findings? Am I skillful enough to develop, 
administer, and interpret the necessary data-gathering devices 
and procedures? Am I well grounded in the necessary knowledge 
of research design and statistical procedures? 

b. Are pertinent data accessible? Are valid and reliable data-gath- 
ering devices and procedures available? Will school authorities 
permit me to contact the students, conduct necessary experiments 
or administer necessary tests, interview teachers, or have access 
to important cumulative records? Will I be able to get the spon- 
sorship necessary to open doors that otherwise would be closed 
to me? 

c. Will [have the necessary financial resources to carry on this study? 
What will be the expense involved in data-gathering equipment, 
printing, test materials, travel, and clerical help? If the project is 
an expensive one, what is the possibility of getting a grant from 
a philanthropic foundation or from such governmental agencies 
as the National Institute of Education? 


d. Will I have enough time to complete the project? will there be 
time to devise the procedures, select the data-gathering devices, 
gather and analyze the data, and complete the research report? 
Since most academic programs impose time limitations, certain 
worthwhile projects of a longitudinal type are precluded. 

e. Will I have the courage and determination to pursue the study 
in spite of the difficulties and social hazards that may be involved? 
Will I be willing to work aggressively when data are difficult to 
gather and when others are reluctant to cooperate? Sex education, 
racial integration, and other controversial problem areas, how- 
ever, may not be appropriate for a beginning research project. 


THE RESEARCH PROPOSAL 


The preparation of a research proposal is an important step in the research 
process. Many institutions require that a proposal be submitted before any 
project is approved. This provides a basis for the evaluation of the project 
and gives the advisor a basis for assistance during the period of his or her 
direction. It also provides a systematic plan of procedure for the researcher 
to follow. | 
The proposal is comparable to the blueprint which the architect pre- 
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pares before the bids are let and building commences. The initial draft 
proposal is subject to modification in the light of the analysis by the student 
and his or her project advisor. Because good research must be carefully 
planned and systematically carried out, procedures that are improvised 
from step to step will not suffice. A worthwhile research project is likely 
to result only from a well-designed proposal. 

The seven-part proposal format presented here should not be con- 
sidered the only satisfactory sequence. Many institutions Suggest other for- 
mats for the research proposal. 


Part 1: The statement of the problem: This is usually a declarative state- 
ment but may be in question form. This attempt to focus on a stated goal — ^ 
gives direction to the research process. It must be limited enough in scope 


A problem suggests a specific answer or conclusion. Usually a con- 
troversy or a difference of opinion exists. A cause-and-effect relationship 
may be suggested upon the basis of theory or previous research findings. 
Personal observation and experience may be the basis of a problem. Some 
examples of problem statements are as follows: ( 1) Children who have had 
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kindergarten experience might demonstrate greater academic achievement 
in the first grade than those who have not had this experience. (2) Partic- 
ipation in high school competitive athletics may be detrimental to academic 
achievement. (3) Racial segregation may have a damaging effect upon the 
self-image of minority group children. (4) Knowledge of participation in 
an experiment may have a stimulating effect upon the reading achievement 
of participants. These problem statements involve more than information 
gathering. They suggest answers or conclusions and provide a focus for 
research activity. 


Part2: The significance of the problem. It is important that the researcher 
point out how the solution to the problem or the answer to the question 
can influence educational theory or practice. That is, the researcher must 
demonstrate why it is worth the time, effort, and expense required to carry 
out the proposed research. Careful formulation and presentation of the 
implications or possible applications of knowledge helps to give the project 
an urgency, justifying its worth. 

Failure to include this step in the proposal may well leave the re- 
searcher with a problem without significance—a search for data of little 
ultimate value. Many of the tabulating or "social bookkeeping" research 
problems should be abandoned if they do not pass the critical test of sig- 
nificance. Perhaps university library shelves would not groan with the weight 
of so many unread and forgotten dissertations if this criterion of signifi- 
cance had been rigorously applied. With so many gaps in educational the- 
ory, and so many areas of education practice in need of analysis, there is 
little justification for the expenditure of research effort on trivial or su- 
perficial investigations. 


Part 3: Definitions, assumptions, limitations, and delimitations. It is impor- 
tant to define all unusual terms that could be misinterpreted. These def- 
initions help to establish the frame of reference with which the researcher 
approaches the problem. The variables to be considered should be defined 
in operational terms. Such expressions as academic achievement and in- 
telligence are useful concepts, but they cannot be used as criteria unless 
they are defined as observable samples of behavior. Academic grades as- 
signed by teachers or scores on standardized achievement tests are oper- 
ational definitions of achievement. A score on a standardized intelligence 
test is an operational definition of intelligence. 

Assumptions are statements of what the researcher believes to be facts 
but cannot verify. A researcher may state the assumption that the partic- 
ipant observers in the classroom, after a period of three days, will establish 
rapport with the students and will not have a reactive effect on the behavior 
to be observed. 
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Limitations are those conditions beyond the control of the researcher 
that may place restrictions on the conclusions of the study and their ap- 
plication to other situations. Administrative policies that preclude using 
more than one class in an experiment, a data-gathering instrument that 
has not been validated, or the inability to randomly select and assign subjects 
to experimental and control groups are examples of limitations. 

Delimitations are the boundaries of the study. A study of attitudes 
toward racial minorities may be concerned only with middle class, fifth- 
grade pupils, and conclusions are not to be extended beyond this population 
sampled. 


Part 4: Review of related literature. A summary of the writings of rec- 
ognized authorities and of previous research provides evidence that the 
researcher is familiar with what is already known and what is still unknown 
and untested. Since effective research is based upon past knowledge, this 
step helps to eliminate the duplication of what has been done and provides 
useful hypotheses and helpful suggestions for significant investigation. Cit- 
ing studies that show substantial agreement and those that seem to present 
conflicting conclusions helps to sharpen and define understanding of ex- 
isting knowledge in the problem area, provides a background for the re- 
search project, and makes the reader aware of the status of the issue. 
Parading a long list of annotated studies relating to the problem is inef- 
fective and inappropriate. Only those studies that are plainly relevant, 
competently executed, and clearly reported should be included. 

In searching related literature, the researcher should note certain 
important elements: 


l. Reports of studies of closely related problems that have been inves- 
tigated 


2. Design of the study, including procedures employed and data-gath- 
ering instruments used 


Populations that were sampled and sampling methods employed 
Variables that were defined 

Extraneous variables that could have affected the findings 
Faults that could have been avoided 

Recommendations for further research 


IUS ei apo 


Capitalizing on the reviews of. expert researchers can be fruitful in 
providing helpful ideas and suggestions. Although review articles that sum- 
marize related studies are useful, they do not provide a satisfactory sub- 
stitute for an independent search. Even though the review of related lit- 
erature is presented as step 4 in the finished research proposal, the search 
for related literature is one of the first steps in the research process. It is 
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a valuable guide to defining the problem, recognizing its significance, sug- 
gesting promising data-gathering devices, appropriate study design, and 
sources of data. 


Part 5: The hypothesis. It is appropriate here to formulate a major 
hypothesis and possibly several minor hypotheses. This approach further 
clarifies the nature of the problem and the logic underlying the investi- 
gation, and gives direction to the data-gathering process. A good hypothesis 
has several basic characteristics: 


1. It should be reasonable. 
2. It should be consistent with known facts or theories. 
8. It should be stated in such a way that it can be tested and found to 
be probably true or probably false. 
.4. It should be stated in the simplest possible terms. 


The research hypothesis is a tentative answer to a question. It is an 
educated guess or hunch, generally based upon prior research and/or the- 
ory, to be subjected to the process of verification or disconfirmation. The 
gathering of data and the logical analysis of data relationships provide a 
method of confirming or disconfirming the hypothesis by deducing its 
consequences. 

It is important that the hypothesis be formulated before data are 
gathered. Suppose that the researcher gathers some data and, on the basis 
of these, notes something that looks like the basis for an alternative hy- 
pothesis. Since any particular set of observations may display an extreme 
distribution, using such observations to test the hypothesis would possibly 
lead to an unwarranted conclusion. 

The formulation of the hypothesis in advance of the data-gathering 
process is necessary for an unbiased investigation. It is not inappropriate 
to formulate additional hypotheses after data are collected, but they should 
be tested on the basis of new data, not on the old data that suggested them. 


Part 6: Methods. This part of the research proposal usually consists 
of three parts: subjects, procedures, and data analysis, The subjects section 
details the population from which the researcher plans to select the sample. 
Variables that are frequently included, depending on the type of project 
proposed, include: chronological age, grade level, socioeconomic status, 
sex, race, IQ (if other than average), mental age (if significantly different 
from chronological age), academic achievement leyel, and other pertinent 
attributes of the targeted population. The number of subjects desired from 
the population and how they will be selected are also indicated in this 
section. The reader should be able to understand exactly from where and 
how the subjects are to be selected. 
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The procedures section outlines the research plan. It describes in detail 
what will be done, how it will be done, what data will be needed, and what 
data-gathering devices will be used (see Chapter 7). The method of ana- 
lyzing the data is described in detail in the third part of the methods section. 
The information given in the data-analysis section should be specific and 
detailed enough to demonstrate to the reader exactly what is planned. No 
details should be left open to question. 


Part 7: Time schedule. Although this step may not be required by the 
study advisor, a time schedule should be prepared so that the researcher 
may budget his or her time and energy effectively. Dividing the project 
into manageable parts and assigning dates for their completion helps to 
systematize the study and minimize the natural tendency to procrastinate. 

Some phases of the project cannot be started until other phases have 
been completed. Such parts of the final research report as the review of 
related literature can be completed and typed while waiting for the data- 
gathering process. If the project is complicated, a flow chart or time-task 
chart may be useful in describing the sequence of events. Since academic 
research projects usually involve critical time limitations and definite dead- 
lines for filing the completed report, the planning of procedures with 
definite date goals is most important. From time to time the major professor 
or advisor may request a progress report. This device also serves as a 
stimulus, helping the researcher to move systematically toward the goal of 
a completed project. 


‘THICS IN HUMAN EXPERIMENTATION 


In planning a research project involving human subjects, it is important to 
consider the ethical guidelines designed to protect your subjects. In par- 
ticular, medical and psychological experimentation using human subjects 
involves some element of risk, however minor, and raises questions about 
the ethics of the process, Any set of rules or guidelines that attempts to 
define ethical limits for human experimentation raises controversy among 


. These issues go beyond courtesy or etiquette and concern the appro- 
priate treatment of persons in a free society. Some of these questions have 
been dealt with by scientists and philosophers, by enactments of legislative 
bodies, by codes of ethics and professional organizations, or by guidelines 
established by educational institutions. 
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In 1974, the Congress of the United States established the National 
Commission for the Protection of Human Subjects of Biomedical and Be- 
havioral Research to formulate guidelines for the research activities of the 
National Institutes of Health and.the National Institute of Mental Health. 

The Commission's 4-year, monthly deliberations, supplemented by 
discussions held at the Smithsonian Institution's Belmont Conference Cen- 
ter, resulted in the publication of the Belmont report: Ethical principles and 
guidelines for the protection of human subjects of research (1979). 

Universities have established human experiment review committees 
to advise academic investigators about appropriate procedures and to ap- 
prove those studies that conform to their ethical guidelines. Most funding 
agencies, private and governmental, require such a review prior to award- 
ing of grants. The university may have ad hoc committees concerned with 
a particular study or standing committees that deal with all experimental 
activities involving the institution or division. In cases where there are 
serious risks that must be weighed against the potential benefits to society, 
reviews by both ad hoc and institution-wide committees may be deemed 
necessary. Some faculty researchers have complained that review commit- 
tees have unduly restricted their experimental activities. It is possible in 
some cases that particular members of the committee did not have the 
technical background to make sound judgments outside their own fields 
of competence. Others have felt that, because the committee assignment 
demanded so much of their time, they could not contribute their best effort. 
However, because it is the primary function of the human experiment 
review committee to maintain the ethical standards of the institution and 
to supervise the ethical guidelines of the funding agencies, it serves a useful 
and necessary purpose. 

In 1953, the American Psychological Association issued its first code 
of ethics for psychologists. In 1963, the code was revised and its preamble 
contained the following statement: 


The psychologist believes in the dignity and worth of the individual human 
being. He is committed to increasing man's knowledge of himself and others. 
While pursuing this endeavor he protects the welfare of any persons who 
may seek his services, or any subject, human or animal, that may be the object 
of his'study. He does not use his professional position or relationship, nor 
does he knowingly permit his own services to be used by others, for purposes 
inconsistent with these values. While demanding for himself freedom of in- 
quiry and communication, he accepts the responsibility this freedom confers; 
for competence where he claims it, for objectivity in the report of his findings, 
and for the consideration of the best interests of his colleagues and of society. 
(American Psychological Association, 1963, p. 2) 


In 1970, the Board of Directors appointed an Ad Hoc Committee on 
Ethical Standards in Psychological Research to bring the 1963 code up to 
date in light of changes in the science, in the profession, and in the broader 


42 


Selecting a Problem and Preparing a Research Proposal 


social context in which psychologists practice. The first draft of the com- 
mittee report was circulated among 18,000 members of the association. 
About 5000 responded with suggestions. In addition, journal editors, staff 
members of research review committees, directors of research organiza- 
tions, writers on research ethics, and leaders in such special fields as hyp- 
nosis were interviewed. These contributions were supplemented by dis- 
cussions at regional and national meetings of the association. Psychology 
departments of universities, hospitals, clinics, and government agencies, as 
well as anthropologists, economists, sociologists, lawyers, philosophers, and 
psychiatrists were consulted. 

As a result of these conversations and correspondence with profes- 
sionals from all scholarly disciplines, a final draft was adopted and pub- 
lished in 1973. In 1978, a Committee for the Protection of Human Subjects 
in Psychological Research was established and charged with making annual 
reviews and recommendations regarding the official APA position. These 
annual reviews led to a revision which went through a similar process of 
consultation as the 1973 edition. A final draft which incorporated various 
suggestions was adopted and published in 1982. 

Ten principles were formulated that deal with the experimenter's 
responsibilities toward participants. In the published report, Ethical Prin- 
ciples in the Conduct of Research with Human Participants (American Psycho- 
logical Association, 1982), each principle is stated with discussion of issues, 
problems, and recommendations for appropriate action. The meticulous 
care with which this code was developed attests to the concern of this 
professional organization for ethical practices in psychological research. 
Readers who are interested in a more complete discussion on ethics in 
human experimentation are urged to read this report. 

The following discussion, while not a summary of the American Psy- 
chological Association (APA) code of ethics, is consistent with the APA 
code. The guidelines discussed here are not ethical absolutes. Rather, they 
characterize writing in the field of ethics and a number of professional 
codes that the authors have examined. The guidelines deal with the fol- 
lowing areas of concern: informed consent; invasion of privacy; confiden- 


` tiality; protection from stress, harm, or danger; and knowledge of outcome. 


Informed consent. Recruitment of volunteers for an experiment should 
always involve the subject's complete understanding of the procedures em- 
ployed, the risks involved, and the demands that may be made upon par- 
ticipants, Whenever possible, subjects should also be informed of the pur- 
pose of the research. When subjects are minors or mentally incapacitated 
due to age, illness, or disability, the informed consent of parents, guardians, 
or responsible agents must be secured. This freedom to participate or to 
decline to participate is basic, and it includes the freedom to withdraw from 
an experiment at any time-without penalty. Coercion to participate or to 
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remain as a participant must not be applied and any exploitation of par- 
ticipants is an unethical practice. 

The following are examples of experiment recruitment practices that 
might raise ethical questions: 


l. Subjects who are inmates of penal institutions volunteer to participate 
because of a need for money or in anticipation of more favorable 
treatment or recommendation of earlier parole. 


2. Medical students who need money are recruited for experiments by 
offers of financial reward. 

3. Participants who do not have the mental capacity to give rational 
consent—the mentally ill, the mentally retarded, or those with reduced 
capacity—are recruited in institutions. 


4. Membersofa college class are required to participate in an experiment 
in order to meet a course requirement. 


Invasion of privacy. Ordinarily it is justifiable to observe and record 
behavior that is essentially public, behavior that others normally would be 
in a position to observe. It is an invasion of privacy to observe and record 
intimate behavior that the subject has reason to believe is private. Concealed 
observers, cameras, microphones, or the use of private correspondence 
without the subject's knowledge and permission are invasions of privacy. 
If these practices are to be employed, the researcher should explain the 
reasons and secure permission. 

This statement is not to suggest- that intimate behavior cannot be 
observed ethically. The sexual behavior studies of Doctors Masters and 
Johnson are based upon observation and recording of the most intimate 
acts, but subjects volunteer to participate with full knowledge of the pur- 
poses and procedures employed. The motivation is based upon confidence 
in the integrity of the researchers and the importance of their scientific 
contributions to human welfare. 


Confidentiality. The ethical researcher holds all information that he 
or she may gather about the subject in strict confidence, disguising the 
pariicipant’s identity in all records and reports. No one should be in a 
position to threaten the subject's anonymity nor should any information 
be released without his or her permission. 


Protection from physical and mental stress, harm, or danger. in using treat- 
ments that may have a temporary or permanent effect on the subjects, the 
researcher must take all precautions to protect their well-being. Treatments 
are administered under the direction of competent professional practi- 
tioners in clinical or research facilities where effective and thorough pre- 
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cautions and safeguards may be assured. Where some risk is unavoidable, 
the potential benefits may be sufficient to justify the research. A balance 
needs to be achieved, with benefit outweighing risk, in such a case. 


Knowledge of outcome. The participant has a right to receive an ex- 
planation for the reasons for the experimental procedures and the results 
of the investigation. The researcher may explain the results and their sig- 
nificance orally, in writing, or by informing participants of the issue of the 
journal in which the report is published. 

Ethical researchers not only observe these ethical guidelines but take 
complete responsibility for the actions of their coexperimenters, colleagues, 
assistants, technical personnel, secretaries, and clerks involved in the proj- 
ect, constantly monitoring their research activities. Researchers have obli- 
gations to their subjects, their professional colleagues, and the public. They 
do not discard unfavorable data that would modify the interpretation of 
their investigation. 'They make their data available to their professional 
peers so that they may verify the accuracy of the results. They honor 
promises made to subjects as a consideration for their participation in a 
study. They give appropriate credit to those who have aided them in their 
investigations, participated in the data analysis, or contributed to the prep- 
aration of the research report. They place scientific objectivity above per- 
sonal advantage and recognize their obligation to society for the advance- 
ment of knowledge. 

Some researchers have been known to justify deception, coercion, 
invasion of privacy, breach of confidentiality, or risks to subjects in the 
name of science, but one might suspect that the prestige, ambition, or ego 
of the experimenter was the primary motivation. 


USING THE LIBRARY 


The student should become thoroughly acquainted with the university 
library, the location of its varied facilities, and the services it provides. In 
addition to the traditional card catalog, many university libraries have com- 
puterized their holdings and have placed terminals in various locations for 
ease of finding books and periodicals. 

Sometimes a student learns of a reference that is not available in the 
local library. Most libraries belong to one of three major shared cataloging 
systems: Online Computer Library Center (OCLC) with the holdings of 
over 3000 libraries; Research Library Network; and, in the Pacific North- 
west, the Washington Library Network. The list of books and periodicals 
available, and the libraries holding these materials, can be quickly accessed 
ona time-sharing computer system avaflable in most libraries. The student's 


Selecting a Problem and Preparing a Research Proposal 45 


library requests the books or a photocopy of the article, which is then loaned 
to the student by his or her library. , 


FINDING RELATED LITERATURE 


Students often waste time searching for references in an unsystematic way. 
The search for references is an ever-expanding process, for each reference 
may lead to a new list of sources. Researchers may consider these sources 
as basic: 


The Education Index 

Resources in Education 

Current Index to Journals in Education 

Index to Doctoral Dissertations and Dissertation Abstracts International 
Other specialized indexes or abstracts indicated by the area of inves- 
tigation (e.g., Psychological Abstracts) 


S rore 


Appendix I lists indexes and abstracts that the student or researcher 
may use to find articles and books on his or her topic. Many of these data 
bases, including Educational Research Information Centers (ERIC), Excep- 
tional Child Education Resources, Psychological Abstracts, and dozens of others 
can be accessed directly through one of the computer services available to 
libraries. Almost all college and university libraries, and many public li- 
braries, offer this service. The investigator, with the help of a librarian, 
uses key words to let the computer system know which materials are desired. 
For instance, if a researcher is reviewing the literature that has used Piaget- 
ian theory with mentally retarded persons, she or he might use the key 
words Piaget wit mental retardation, mentally retarded, and retardation. 
The computer then searches all the titles and abstracts for those containing 
both Piaget and one of the other key words. The investigator can then have 
the titles or titles and abstracts printed either “online” or, less expensively, 
overnight at the computer services facilities. Considering the time that is 
saved by using a computer search facility, the cost is minimal. 


Microfiche 


The development of microfiche has been one of the most significant con- 
tributions to library services by providing economy and convenience of 
storing and distribution of scholarly materials. 

A microfiche is a sheet of film that contains microimages of printed 
materials. Filmed at a reduction of 1 to 24 or higher, nearly one hundred 
84" x. 11" pages of copy can be reproduced on one 4" x 6" film card. 
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Microfiche readers that magnify the microimages to original or larger copy 
size are available at libraries. Some microfiche readers provide up to 40 x 
magnification on screens as large as 15" x 21". / 

There are many document reproduction services that supply micro- 
fiche to libraries upon subscription or upon special order. 

The Educational Resources Information Center (ERIC) has prepared 
for the National Institute of Education a directory of nearly 600 libraries 
that possess extensive collections or receive regular periodic shipments of 
ERIC microfiche collections. These libraries receive approximately 17,000 
microfiche per year. Other document reproduction sources that provide 
microfiche to libraries or individuals are described later in this chapter. 


Super- and Ultra-Microfiche 


Recent developments in the field of micro-printing will transform the pro- 
cess of storage, retrieval, and distribution of published materials in libraries 
of the future. A super-microfiche has been developed that contains up to 
1000 pages of printed material on a single 4" x 6" transparent card, the 
equivalent of two or more books. An even more spectacular development 
is the ultra-microfiche that contains up to 3200 microdots on a single card. 
When projected, each dot contains the equivalent of several pages. Thus, 
seven to ten volumes could be contained on a single 4" x 6" transparent 
card. Reader printers make hard copy printouts (8/2" x 11" reproductions) 
of any page in a few seconds. 


NOTE TAKING 


One of the most important research activities of the graduate student is 
note taking—putting materials in a form that can easily be recalled and 
used in the future. Notes will result from speeches and lectures, class dis- 
cussions, conversation, from solitary meditation, and from reading refer- 
ence materials. In preparing term papers and research reports the notes 
that result from reading will be most significant. Without a careful, Sys- 
tematic method of note taking, much of what is read is quickly forgotten. 


Reading-reference notes have been classified under four principal 
categories: 


1. Quotation. The exact words of an author are reproduced, enclosed in 
quotation marks. It is essential to copy each statement accurately, and 
to indicate the exact page reference so that the quotations may be 
properly referenced in the written report. 


2. Paraphrase. 'The reader restates the author's thoughts in his or her 
own words. 
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3. Summary. The reader states in condensed form the contents of the 
article. 

4. Evaluation. The reader records his or her'own reaction, indicating 

+- agreement or disagreement, or interpreting the point of view of the 
writer. 


A single note card may include several of these types when it seems 
appropriate. 


A SUGGESTED METHOD 
FOR TAKING NOTES 


l. Skim the reference source before taking any notes. A bird's-eye view 
is essential before one can decide what material to record and use. 
Selecting the most significant material is a skill to be cultivated. 

2. Use 4" x 6" index cards. They are easily sorted by subject headings 
and are large enough to include a reasonable amount of material. 
Some students prefer 5" x 8" cards, which are less convenient to carry 
but provide more space for notes. 

3. File each note card under a definite topic or heading. Place the subject 
heading at the top of the card for convenient filing. A complete 
bibliographic citation should be placed at the bottom of the note card. 
If a book has been used, the call number should be indicated to 
facilitate library location in the future. (See Fig. 2— 1.) 

4. Include only one topic on a card. This makes organization of notes 
flexible. If the notes are lengthy, use consecutively numbered cards, 
and slip a rubber band around them before filing. 

5. Besure that notes are complete and clearly understandable, for they 
are not likely to be used for some time after they have been taken. 

6. Distinguish clearly between a summary, a direct quotation of the au- 
thor, a reference to the author's source, and an evaluative statement. 

7. Do not plan to recopy or type your notes. It wastes time and increases 
possibility of error and confusion. Copy your notes carefully the first 
time. 

8. Keep a supply of note cards with you at all times, so that you can jot 
down ideas that come to you while waiting, riding the bus, or listening 
to a lecture or discussion. 

9. Becareful not to lose your notes. As soon as they are copied, file them 
in a card index box. If you must carry them with you, use the 
4" x 6" or 5" x 8" accordion file folder, and be sure that your name 
and address is clearly printed on it. 
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FIGURE 2-1 Note card (4" x 6"). 


10. Keep a permanent file of your notes. You may find the same notes 
useful in a number of courses or in writing a number of reports. 


When taking notes, consider the advisability of making photostatic 
copies of book and journal pages so that they can be examined more 
efficiently at home. Many book and journal materials are reproduced on 
microfiche cards that may include as many as 1000 pages on one 4" x 6" 

- card. The rapid trend toward microfilming and microfiching professional 
literature will continue as the constantly increasing volume of published 
materials burdens limited shelf space. Coin-operated microfilm and mi- 
crofiche printers are found in most university libraries. Reproduction is 
not expensive and the quality of the copy is excellent. It is much more 
convenient to take notes from an 8!" x 11" print copy than from film 
projected on a screen. 


References and Bibliography 


In preparing a journal report, paper, or research proposal the author is 
expected to include a list of the references that have been cited in the text. 
Sometimes it is preferable to include additional materials that were used 
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FIGURE 2-2 Bibliography card. 


by the author but not actually cited in the paper. In this case the author 
would provide a bibliography which includes all the relevant references, 
cited or not. 

The most convenient way to assemble and organize references or a 
bibliography is by the use of bibliography cards. The card includes the 
names of the authors, the facts of publication, and the annotation (see 
Chapter 11 for examples using the American Psychological Association 
system). Placing the information on cards makes it easy to assemble the 
authors' names in the alphabetical order in which they are listed in the 
bibliography of the report. (See Fig. 2-2.) 


THE FIRST RESEARCH PROJECT 


Experience has indicated that one way to understand the methodology and 
processes of research is to engage in research. Such a project may be very 
modest in nature and necessarily limited by time, the experience of the 
student, and many other factors associated with the graduate student's 
other obligations. However, the methodology may be learned by actively 
engaging in the research process under the careful supervision of the 
instructor in the beginning course in research. Respectable research proj- 
ects have been undertaken and reported on within a semester's time, even 
within an 8-week summer session. Although most of these studies have 
been of the descriptive-survey type, some simple historical and experi- 
mental studies have also been completed. The emphasis must necessarily 
be placed on the process rather than on the product or its contribution to 
the improvement of educational practice. A study chosen for this first 
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project, however, may not be of great enough significance to serve as an 
appropriate thesis problem. ma 
The full-scale project may be either an individual or a group enter- 
prise. Groups of three to five graduate students can profitably work to- 
gether on the planning of the study. Data-gathering devices may be chosen | 
or constructed through joint enterprise. Data may be gathered within the 
university graduate class, or in the classrooms, schools, or communities in 
which the group's members teach. However, it is recommended that the | 
next steps—organization and analysis of data and the writing of the final | 
report—be an individual project. There is always the danger in a group | 
project of “letting George do it,” and incidentally letting George get all the | 
benefit from the experience. 6 : | 
This recommended combination of group effort in the initial stages | 
and individual effort in the later stages represents a compromise that seems i 
effective and enables students to carry through a study in a limited amount | 
of time with reasonable opportunity for personal growth. For some of those | 
who will write a thesis in partial fulfillment of degree requirements, this ! 
first project may serve as preparation. For others, it may initiate a study ] 
capable of subsequent expansion into'à thesis or dissertation. | 


Many research-course instructors believe that a more practical re- 
quirement would be the preparation of a carefully designed research pro- 
posal rather than a limited-scope study. There is much to be said for this 
point of view because the beginning research student is inexperienced, the ] 
time is short, and there is a real danger of conveying a superficial concept | 
of sound research. i 

The following topics were selected by inexperienced student research- ] 
ers who were carrying on a project or writing a proposal in partial fulfill- 
ment of the requirements of a beginning course in educational research. 
Most of the topics were short, action-type descriptive studies, not based 
upon random selection and random assignment of subjects or observations. | 
Notice that the wording of the titles did not imply generalization of the 
conclusions to a wider population. The primary purpose was a learning 
exercise, not a contribution to a field of knowledge. | 


TOPICS USED BY STUDENTS | 
IN A BEGINNING GRADUATE COURSE | 
IN EDUCATIONAL RESEARCH | 


l. The Attitudes of a Group of University Seniors toward Coeducational 
Dormitories 

2. The Reading Skill Development of a Deaf Fi irst-grade Child 

The Status of Latin in Indiana High Schools 


4. Discipline Problems at Washington High School as Viewed bya Group 
of Seniors 
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A Study of the Effectiveness of Chisanbop Calculation with a Group 
of Third-grade Pupils 

The Status of Music in the Western Yearly Meeting of the Society of 
Friends 

The Rehabilitation of a Group of Heroin Addicts in a Federally Funded 
Drug Treatment Center 

The Social Development of a 6-year- “old Autistic Child 

The Effect of Trial Promotion on the Academic Achievement of a 
Group of Underachievers 

The Effects of Parent Visitation upon the Reading Performance of a 
Group of Fourth-grade Students 

The Predictive Value of Entrance Examinations at the Methodist School 
of Nursing 

The Jnterests of a 3-year-old Boy 

Student Participation in the Activity Program at Lawrence Central 
High, School 

A Comparison of Regular Classroom and Learning Disabled Chil- 
dren’s Expressive Language Skills 

The Effect of Verbal Mediation upon the Mathematical Achievement 
of Learning Disabled Students 

The Effect of Teacher Education on Attitudes toward Mainstreaming 
Effect of Verbalization on Performance and On-Task Behavior of 
Reading Disabled Children 

Prevalence of Behavior Problems of Hearing Siblings of Deaf Chil- 
dren 

The Influence of Kindergarten Experience on the Subsequent Read- 
ing Achievement of a Group of Third-grade Pupils 

The Views of Selected Baptist Laymen, Ministers, and National Church 
Leaders Concerning Issues Relating to the Tradition of Separation 
of Church and State 

The Attitudes and Behavior of Freshmen and Seniors Regarding 
Classroom Dishonesty at Sheridan High School 

The Attitudes of a Group of Florida School Superintendents toward 
Mandated Minimum Competency Testing 

Authority Images of a Selected Group of Inner City Children 

The Achievement of Twins, Both Identical and Fraternal, in the Leb- 
anon, Indiana Metropolitan School District 

A Follow-up Study of Nonpromoted Students at School #86 

A History of the Indiana Boy’s School, Plainfield, Indiana 
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27. A Comparative Analysis of the Self-Concepts of a Group of Gifted 
and Slow-learning Children 

98. The Attitudes of a Group of High School Seniors toward Nuclear 
Protest Movements 

99. The Educational Backgrounds of 129 American Celebrities Listed in 
the 1966 Current Biography Yearbook 

30. The Attitudes of a Group of Graduate Students toward Mandated 
Smoking Restrictions in Public Facilities 

31. The Influence of Entering Age upon the Subsequent Achievement 
at First-, Second-, and Third-grade Levels in Washington Township 

32. The Attitudes of a Selected Group of Black and White Parents toward 
Busing to Achieve Racial Integration 

33. A Study of Socioeconomic Status in the Butler-Tarkington Area, a 
Racially Integrated Community 

34. A Follow-up Study of the 1970 Graduates of Grace Lutheran School 

35. The Effect of Title IX, Prohibiting Sex Discrimination in Public Schools, 
upon the Athletic Budgets of Illinois Public Colleges and Universities 


For experienced researchers, projects would necessarily be more the- 
ory-oriented, with conclusions generalized. beyond the specific group ob- 
served. At this more advanced level a careful process of randomization 
would be desirable, if not necessary, and the research design would be 
much more rigorous. The details of some of the more sophisticated pro- 
cedures are partially explained in subsequent chapters of this text and in 
other relevant sources, particularly discussion of experimental and descrip- 
tive research processes, the selection or construction of data-gathering de- 
vices, and the statistical analysis of data. 


SUBMITTING A RESEARCH PROPOSAL - 
TO A FUNDING AGENCY 


Seasoned researchers may plan to submit research proposals to foundations 
or government agencies for financial support. The beginning researcher 
may not feel the need for suggestions, but it may be helpful to understand 
the detailed type of information that a foundation or agency would expect 
to receive before committing its funds. 


The following is a list of suggestions for those who seek financial 
support: 


1. Write the proposal very carefully. A carelessly written proposal sug- 
gests to the evaluators that the research project would be carelessly 
done. Itis also useful to follow the format recommended by the agency 
in writing the proposal. 


SUMMARY 


10. 


bis 


12. 
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Pay attention to stated goals and priorities of the foundation or agency. 
It is important to point out how your study would be relevant to these 
goals. ; 

State your problem in such a way that the proposal evaluators, who 
are capable and experienced in judging research proposals but know 
nothing about your project, will be able to judge its worth and the 
likelihood of its contributing to a significant area of knowledge: 
Indicate how your study will add to or refine present knowledge. 
State your hypothesis or hypotheses in both conceptual and opera- 
tional terms and in both substantive and null form. 


Indicate that you are completely familiar with the field of investigation 

and are aware of all recent studies in the problem area. 

Indicate how you propose to test your hypotheses, describing your 

research design and the data-gathering instruments or procedures 

that you will use, indicating their known validity and reliability. 

Describe your sampling procedures, indicating how you will randomly 

select and randomly assign your subjects or observations. 

Indicate the extraneous variables that must be recognized and explain 

how you propose to minimize their influence. 

Explain the statistical procedures that you will employ, indicating any 

computer application that you will use. 

Prepare a budget proposal estimating the funds required for 

wages, including any fringe benefits 

purchase or rental of special equipment or supplies 

travel expenses 

clerical expenses 

additional overhead expenses that may be involved 

publication costs 

Provide some tangible evidence of your competence by listing 

a. research projects that you have carried on or actively participated 
in 

b. your scholarly journal articles, including abstracts of your studies 

c. your academic training and other qualifications 


—"—^"noc» 


Academic research projects are usually required in partial fulfillment of 
the requirements of a course or a degree program. The motivation is not 
always a genuine desire to engage in research. In addition, limitations of 
time, money, and experience usually preclude the consideration of prob- 
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EXERCISES 


lems that could make significant contributions to educational theory and 
practice. i 

The choice of a suitable problem is one of the most difficult tasks 
facing the beginning researcher. Students tend to define problems that are 
too broad in scope or that deal with too fragmentary aspects of the problem. 
Consultation with the course instructor or advisor is particularly helpful 
in identifying a problem that is manageable and significant enough to justify 
the time and effort that will be required. 

Problems are found in the teachers’ daily classroom, school, and com- 
munity experiences. Technological and social changes call for research 
evidence to chart new courses in educational practice. Graduate academic 
experience helps to promote problem awareness through classroom activ- 
ities, the reading of research studies, and interaction with instructors, ad- 
visors, and fellow students. 

A good research problem has the qualities of significance, originality, 
and feasibility. The researcher should evaluate a proposed problem in the 
light of his or her competence, the availability of data, the financial demands 
of the project, the limitations of time, and the possible difficulties and social 
hazards involved. 

A research proposal is required by many institutions and services as 
a useful basis for the evaluation of a project as well as a guide for the 
researcher. The proposal contains a clear and concise statement of the 
problem; the hypothesis or hypotheses involved; a recognition of the sig- 
nificance of the problem; definitions of important terms; assumptions, 
delimitations, and limitations; a review of related literature; an analysis of 
proposed research procedures; a reference list; and a time schedule. Some 
advisors request a progress report from time to time to evaluate the prog- 
ress of the investigation. 

One way to learn about research is to conduct a study in connection 
with the beginning research course. Another way is to write a research 
proposal which may involve all the steps in the research process except the 
gathering and analysis of data and the formulation of conclusions. Either 
of these exercises gives a focus to the discussion about research and may 
help in developing some competence and the research point of view. It 
may even encourage some teachers to conduct modest studies in their own 
schools during or after the completion of their graduate programs. 


1. The following research topics are faulty or are completely inappropriate. Revise 
each one, if possible, so that it describes a feasible project or proposal for this 
course. 


a. The Attitudes of Teachers toward Merit Rating 
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How to Teach Poetry Most Effectively 

The Best Way to Teach Spelling 

The Evils of Alcohol 

Does Ability Grouping Meet the Needs of Students? 

The Adequacy of Law Enforcement 

The Hazards of Smoking 

Why the Discussion Method Is Better than the Lecture Method 

. The Fallacy of Evolution i 

2. State a hypothesis, first in scientific or research form, and then in null or sta- 
tistical form. 

3. Define the following terms in operational form: 

intelligence 

creativity 

coordination 

authorization 

memory 

4. In a research study, is a hypothesis to be tested always preferable to a question 
to be remembered? Why or why not? 

5. What are some of the more effective ways to find a suitable research problem? - 


—7z70-0200c0 
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HISTORICAL RESEARCH 


Historical research differs markedly from the sort of research conducted 
by most scientists, including behavioral and social scientists. In fact, it is so 
different from other types of research that it almost does not belong as a 
topic in this book. It is included because many areas of concern to education 
can best be studied in this way, because the quantity and quality of research 
on the history of education has increased greatly in the past two decades 
(e.g., Best, 1983; Warren, 1978), and because a review of the research 
literature which is done prior to other types of research is, in effect, a 
historical study. 

History is a meaningful record of human achievement. It is not merely 
a list of chronological events but a truthful integrated account of the re- 
lationships between persons, events, times, and places. We use history to 
understand the past and to try to understand the present in light of past 
events and developments. We also use it to prevent "reinventing the wheel" 
every few years. Historical analysis may be directed toward an individual, 
an idea, a movement, or an institution. However, none of these objects of 
historical observation can be considered in isolation. People cannot be sub- 
jected to historical investigation without some consideration of their inter- 
action with the ideas, movements, and/or institutions of their times. The 
focus merely determines the points of emphasis toward which historians 
direct their attention. 
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TABLE 3-1 Some Examples of the Historical Interrelationship qui Men, 


Movements, and Institutions 


INSTITUTIONS 
MEN MOVEMENTS General Type Name 
Ignatius of Loyola Counter-Reforma- Religious Teaching Society of Jesus, 
tion Order 1534 (Jesuit Soci- 
y s ety) 
Benjamin Franklin Scientific Movement. Academy Philadelphia Acad- 
Education for Life emy, 1751 
Daniel Coit Gilman Graduate Study and University Graduate Johns Hopkins 
G. Stanley Hall Research School University, 1876 
Wm. Rainey Harper Clark University, 
1887 
University of Chi- 
cago, 1892 
John Dewey Experimentalism Experimental School ^ University of Chi- 
Progressive Educa- cago Elementary 
tion School, 1896 
W. E. B. Dubois Racial Integration in Persuasion Organi- National Assn. for 
Walter White the Public zation the Advancement 
Schools of Colored People, 
1909 
B. R. Buckingham Scientific Research Research Periodical, Journal of Ed. Re- 
in Education Research Organi- , search, 1920 
zation American Education- 
al Research 
Assn., 1931 


Table 3-1 illustrates several historical interrelationships that have been 


taken from the history of education. For example, no matter whether the 
historian chooses for study the Jesuit Society, religious teaching orders, the 
Counter-Reformation, or Ignatius of Loyola, each of the other elements 
appears as a prominent influence or result and as an indispensable part of 
the account. The interrelationship of this institution, movement, and man 
would make the study of one in isolation from the others meaningless, if 
not impossible. 

Those who wish to engage in historical research should read the works 
of historians regarding the methods and approaches to conducting histor- 
ical studies in education (e.g., Best, 1983; Billington, 1975; Brickman, 1982; 
Gottschalk, 1950; Hockett, 1948; Warren, 1978). 


THE HISTORY OF AMERICAN EDUCATION ^ 


Historical studies deal with almost every aspect of American education. 
Such investigations have pointed out the important contributions of both 
educators and statesmen. They have examined the growth and develop- 
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ment of colleges and universities, elementary and secondary schools, ed- 
ucational organizations and associations, the rise and decline of educational 
movements, the introduction of new teaching methods, and the issues that 
have persistently confronted American education. 

An understanding of the history of education is important to profes- 
sional workers in this field. It helps them to understand the how and why 
of educational movements that have appeared and, in some cases, continue 
to prevail in the schools. It helps them to evaluate not only lasting contri- 
butions but also the fads and “bandwagon” schemes that have appeared 
on the educational scene only to be discarded. f 

An examination of many developments of the past séems to confirm 
the observation that little in education is really new: Practices hailed as 
innovative are often old ideas that have previously been tried and replaced 
by something else. Innovators should examine the reasons why such prac- 
tices were discarded and consider whether their own proposals are likely 
to prove more successful. Several studies, briefly described, illustrate the 
historical background of some contemporary educational movements and 
issues. 

Organized programs of individualized instruction introduced in a 
number of school systems in the 1960s seem to be similar in many respects 
to those introduced in a number of schools in the 1890s and in the first 
quarter of the twentieth century. First introduced at Pueblo, Colorado, and 
known as the Pueblo Plan, later modified and known as the Winnetka and 
Dalton Plans, these programs do seem to have common elements. Dispens- 
ing with group class activity in academic courses, students were given units 
of work to complete at their own rate before proceeding to more advanced 
units. Individual progress based upon mastery of subject matter units was 
the criterion for promotion or completion of a’ course. Search (1901) ad- 
vocated this plan, and his influence upon Carleton Washburn in the ele- 
mentary schools of Winnetka, Illinois, and Helen Parkhurst in the second- 
ary schools at Dalton, Massachusetts, is generally recognized. Whether the 
Pueblo, Winnetka, or Dalton plans were fads or sound programs, the fact 
remains that they disappeared from the schools before reappearing in the 
1960s. 

The place of religion in public education is an issue that concerns 
many people. In the period following World War II, in a series of Supreme 
Court decisions, religious instruction and religious exercises within public 


schools have been declared unconstitutional and in violation of the First _ 


Amendment of the United States Constitution. In 1963, in the case of 
Abington School District v. Schempp, the Court held that a Pennsylvania law 
requiring daily Bible reading was in violation of the First Amendment. 
Much resentment and criticism of the Supreme Court followed this deci- 
sion, and several efforts have been made to introduce amendments to the 
Constitution to permit religious exercises in the public schools. 

The Bible reading issue was also a bitter one more than 100 years 
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ago. The Philadelphia Bible Riots of 1840 (Lannie & Diethorn, 1968) re- 
sulted in the deaths of about 45 soldiers and civilians, serious injury to 
about 140, and property damage to homes and churches valued at nearly 
$500,000. Nativis/foreign-born, and Catholic/Protestant conflicts pro- 
duced the tense atmosphere, but the Bible reading issue precipitated the 
riots. It is apparent that Bible reading is not an issue of recent origin and 
that an understanding of previous conflicts places the issue in clearer 
perspective. 

The contributions of Thomas Jefferson, Benjamin Franklin, Calvin 
Stowe, Catherine Beecher, Horace Mann, Maria Montessori, Henry Bar- 


, nard, Ella Flagg Young, William Holmes McGuffey, Daniel Coit Gilman, 


John Dewey, and many other eminent educators have been carefully ex- 
amined in many studies, and their impact on American education has been 
noted. 

Thursfield (1945) studied Henry Barnard's American Journal of Edu- 
cation, published in 31 massive volumes between 1855 and 1881. He points 
out the Journal's vital contribution to the development of American edu- 
cation. Through its comprehensive treatment of all aspects of education it 
provided a readily available medium for the presentation and exchange of 
ideas of many of the great educators of the period. It has been stated that 
almost every educational reform adopted in the last half of the nineteenth 
century was largely due to the influence of the Journal. Among its con- 
tributors were Henry Barnard, Horace Mann, Bronson Alcott, Daniel Coit 
Gilman, William T. Harris, Calyin Stowe, and Herbert Spencer, in addition 
to many prominent foreign contributors. 

Cremin (1961) examined the reason for the rise and decline of the 
Progressive Education movement, including the major changes in philos- 
ophy and practices that transformed American education and the forces 
that brought the movement toa halt in the 1950s. Although some historians 
differ with his conclusions, Cremin's analysis is the definitive history of 
Progressive Education in America. 

"These historical studies are examples of but a few of the thousands 
of books, monographs, and periodical articles that depict the story of Amer- 
ican education. In addition to examining these works, students are urged 
to consult the History of Education Quarterly, in which scholarly book reviews 
and critical analyses of contemporary historical research are presented. 


HISTORY AND SCIENCE 


Opinions differ as to whether or not the activities of the historian can be 
considered scientific or whether there is such a thing as historical research. 


Those who take the negative position may point out the following 
limitations: 


——— 
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Although the purpose of science is prediction, the historian cannot 
usually generalize on the basis of past events. Because past events 
were often unplanned or did not develop as planned, because there 
were so many uncontrolled factors, and because the influence of one 
or a few individuals was so crucial, the same pattern of factors is not 
repeated. 

The historian must depend upon the reported observations of others, 
often witnesses of doubtful competence and sometimes of doubtful 
objectivity. 3- : 

The historian is much like a person trying to complete a complicated 
jigsaw puzzle with many of the, parts missing. On the basis of what is 
often incomplete evidence, the historian must fill in the gaps by in- 
ferring what has happened and why it happened. 

History does not operate in a closed system such as may be created 
in the physical science laboratory. The historian cannot control the 
conditions of observation nor manipulate the significant variables. 


Those who contend that historical investigation may have the char- 


acteristics of scientific research activity present these arguments: 


l 


The historian delimits a problem, formulates. hypotheses or raises 
questions to be answered, gathers and analyzes primary data, tests the 
hypotheses as consistent or inconsistent with the evidence, and for- 
mulates generalizations or conclusions. 

Although the historian may not have witnessed an event or gathered 
data directly, he or she may have the testimony of a number of wit- 
nesses who have observed the event from different vantage points. It 
is possible that subsequent events have provided additional infor- 
mation not available to contemporary observers. The historian rig- 
orously subjects the evidence to critical analysis in order to establish 
its authenticity, truthfulness, and accuracy. 

In reaching conclusions, the historian employs principles of proba- 
bility similar to those used by physical scientists. 

Although it is true that the historian cannot control the variables 
directly, this limitation also characterizes most behavioral research, 
particularly nonlaboratory investigations in sociology, social psychol- 
ogy, and economics. 

The observations of historians may be described in qualitative or quan- 
titative terms depending on the subject matter and the approach of 
the historian. In general, the traditional approach is qualitative while 
the revisionists use quantitative analyses. The traditional, qualitative 
approach in many historical studies does not preclude the application 
of scientific methodology. As Brickman (1982) points out, it simply 
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requires “the synthesis and presentation of the facts in a logically 
organized form" (p. 91). 


Historical Generalization 


"There is some difference of opinion, even among historians, as to whether 
or not historical investigations can establish generalizations. Most historians 
would agree that some generalizations are possible, but they disagree on 
the validity of applying them to different times and places. Gottschalk 
(1963) states the case of the comparative historian in this way: 


' Sooner or later one or more investigators of a period or area begin to suspect 
some kind of nexus within the matter of their historical investigation. Though 
such “hunches,” “insights,” “guesses,” "hypotheses"—whatever you may call 
them—may be rejected out of hand by some of them, the bolder or rasher 
among them venture to examine the possibility of objective reality of such a 
nexus, and then it is likely to become a subject of debate, and perhaps of 
eventual refinement to the point of wide recognition in the learned world. 
The process is not very different from the way analytical scholars in other 
fields proceed— Darwin, for example, or Freud. If this process serves no other 

urpose, it at least may furnish propositions upon which to focus future 
investigations and debates. . . . 

But do not these historical syntheses, no matter what their author's inten- 
tion, invariably have a wider applicability than to any single set of data from 
which they rose? If Weber was right, isn't it implicit in this concept of the 
Protestant ethic that where a certain kind of religious attitude prevails, there 
the spirit of capitalism will, or at least may, flourish? . . . If Mahan was right, 
couldn't victory in war (at least before the invention of the airplane) be re- 
garded as dependent on maritime control? If Turner was right, won't his 
frontier thesis apply to some extent to all societies that have frontiers to 
conquer in the future, as well as it has applied to American society in the 
past? (pp. 121—122)! 


Finley (1963) comments on generalization: 


Ultimately the question at issue is the nature of the historian's function. Is it 
only to recapture the individual concrete events of a past age, as in a mirror, 
so that the progress of history is merely one of rediscovering lost data and 
of building bigger and better reflectors? If so, then the chronicle is the only 
correct form for his work. But if it is to understand—however one chooses 
to define the word—then it is to generalize, for every explanation is, or 
implies, one or more generalizations. (p. 34) 


Aydelotte (1963) states the argument for generalization: 


Certainly the impossibility of final proof of any historical generalizatio: 
be at once conceded. Our knowledge of the past is both o limited adi Eo 


1 From “Categories of Historical Generalization" in Generalization in the Writin; f His- 
1 s ig of His- 

tory, Louis Gottschalk, ed. (Chicago: University of Chicago Press. 1963), 121- i 
ry, | gr igi " ), 121-22. Used with 


d» 


Historical Research 63 


extensive. Only a minute fraction of what has happened has been recorded, 
and only too often the points on which we need most information are those 
on which our sources are most inadequate. On the other hand, the frag- 
mentary and incomplete information we do have about the past is too abun- 
dant to prevent our coming to terms with it; its sheer bulk prevents its being 
easily manipulated, or even easily assimilated, for historical purposes. Further, 
historians deal with complex problems, and the pattern of the events they 
study, even supposing it to exist, seems too intricate to be easily grasped. 
Doubtless, finality of knowledge is impossible in all areas of study. We have 
learned through works of popularization how far this holds true even for the 
natural sciences, and, as Crane Brinton says, the historian no longer needs 
to feel that “the uncertainties and inaccuracies of his investigation leave him 
in a position of hopeless inferiority before the glorious certainties of physical 
science." (pp. 156-157) 


The foregoing quotations are presented in support of the position 
that the activities of the historian are not different from those of the sci- 
entist. Historical research as it is defined in this chapter includes delimiting 
a problem, formulating hypotheses or generalizations to be tested or ques- 
tions to be answered, gathering and analyzing data, and arriving at prob- 
ability-type conclusions or at generalizations based upon deductive-induc- 
tive reasoning. 


THE HISTORICAL HYPOTHESIS 


Nevins (1962) illustrates the use of hypotheses in the historical research of 
Edward Channing in answering the question, "Why did the Confederacy 
collapse in April 1865?" Channing formulated four hypotheses and tested 
each one in light of evidence gathered from letters, diaries, and official 
records of the army and the government of the Confederacy. He hypoth- 
esized that the Confederacy collapsed because of 


l. The military defeat of the Confederate army 

2. The dearth of military supplies 

3. The starving condition of the Confederate soldiers and the civilians 
4. The disintegration of the will to continue the war 


Channing produced evidence that seemed to refute the first three 
hypotheses. More than 200,000 well-equipped soldiers were under arms 
at the time of the surrender, the effective production of powder and arms 
provided sufficient military supplies to continue the war, and enough food 
was available to sustain fighting men and civilians. 

Channing concluded that hypothesis 4, the disintegration of the will to 
continue the war, was substantiated by the excessive number of desertions 
of enlisted men and officers. Confederate military officials testified that 
they had intercepted many letters from home urging the soldiers to desert. 
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Although the hypothesis sustained was not specific enough to be particu- 
larly helpful, the rejection of the first three did claim to dispose of some 
commonly held explanations. This example illustrates a historical study in 
which hypotheses were explicitly stated. 


Hypotheses in Educational Historical Research 


Hypotheses may be formulated in historical investigations of education. 
Several examples are listed. 


l. The educational innovations of the 1950s and 1960s were based upon 
practices that previously have been tried and discarded. 

2. Christian countries whose educational systems required religious in- 
struction have had lower church attendance rates than those countries 
in which religious instruction was not provided in the schools. 

3. The observation of European school systems by American educators 
during the nineteenth century had an important effect upon Amer- 
ican educational practices. 

4. The monitorial system had no significant effect upon American ed- 
ucation. 


vestigations, they are usually implied. The historian gathers evidence and 
carefully evaluates its trustworthiness. If the evidence is compatible with 
the consequences of the hypothesis, it is confirmed. If the evidence is not 
compatible, or negative, the hypothesis is not confirmed. It is through such 
synthesis that historical generalizations are established. 

The activities of the historian, when education is his or her field of 
inquiry, are no different from those employed in any other field. The 
sources of evidence may be concerned with schools, educational practices 
and policies, movements, or individuals, but the historical processes are the 
same. 


Difficulties Encountered in Historical Research 


The problems involved in the process of historical research make it a some- 
what difficult task. A major difficulty is delimiting the problem so that a 
satisfactory analysis is possible. Too often, beginners state a problem much 
too broadly; the experienced historian realizes that historical research must ] 
involve a penetrating analysis of a limited problem rather than a superficial 
examination of a broad area. The weapon of research is the target pistol, 
not the shotgun. 

Since historians may not have lived during the time they are studying 
and may be removed from the events they investigate, they must often 
depend upon inference and logical analysis, using the recorded experience 


Although hypotheses are not always explicitly stated in historical in- 
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` of others rather than direct observation. To ensure that their information 
is as trustworthy as possible, they must rely on primary, or firsthand, ac- 
counts. Finding appropriate primary sources of data requires imagination, 
hard work, and resourcefulness. - 
Historians must also keep in mind the context in which the events 
being studied occurred and were recorded. It is necessary to keep the biases 
and beliefs of those who recorded the events in mind, as well as the social 
and political climate in which they wrote. 


SOURCES OF DATA 
Historical data are usually classified into two main categories: 


1. Primary sources are eyewitness accounts. They are reported by an 
actual observer or participant in an event. 

2. Secondary sources are accounts of an event that were not actually 
witnessed by the reporter. The reporter may have talked with an 
actual observer or read an account by an observer, but his or her 
testimony is not that of an actual participant or observer. Secondary 
sources may sometimes be used, but because of the distortion in pass- 
ing on information, the historian uses them only when primary data 
are not available. 


Primary Sources of Data 


Documents. Documents are the records kept and written by actual 
participants in, or witnesses of, an event. These sources are produced for 
the purpose of transmitting information to be used in the future. Docu- 
ments classified as primary sources are constitutions, charters, laws, court 
decisions, official minutes or records, autobiographies, letters, diaries, ge- 
nealogies, census information, contracts, deeds, wills, permits, licenses, af- 
fidavits, depositions, declarations, proclamations, certificates, lists, hand- 
bills, bills, receipts, newspaper and magazine accounts, advertisements, maps, 
diagrams, books, pamphlets, catalogs, films, pictures, paintings, inscrip- 
tions, recordings, transcriptions, and research reports. 


Remains or relics. Remains or relics are objects associated with a per- 
son, group, or period. Fossils, skeletons, tools, weapons, food, utensils, 
clothing, buildings, furniture, pictures, paintings, coins, and art objects are 
examples of those relics and remains that were not deliberately intended 
for use in transmitting information or as records. However, these sources 
may provide clear evidence about the past. The contents of an ancient 
burial place, for instance, may reveal a great deal of information about the 
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way of life of a people—their food, clothing, tools, weapons, art, religious 
beliefs, means of livelihood, and customs. Similarly, the contents of an 
institution for the mentally ill or mentally retarded can reveal a good deal 
of information about the way the clients were treated, including the quality 
of food, the opportunity for work and recreational activities, and whether 
abuses regularly occurred. 


Oral testimony. Oral testimony is the spoken account of a witness of, 
or participant in, an event. This evidence is obtained in a personal interview 
and may be recorded or transcribed as the witness relates his or her ex- 
periences. 


Primary Sources of Educational Data 


Many of the old materials mentioned in the preceding section provide 
primary evidence that may be useful specifically in studying the history of 
education. A number are listed here. 


Official records and other documentary materials. Included in this cate- 
gory are records and reports of legislative bodies and state departments of 
public instruction, city superintendents, principals, presidents, deans, de- 
partment heads, educational committees, minutes of school boards and 
boards of trustees, surveys, charters, deeds, wills, professional and lay pe- 
riodicals, school newspapers, annuals, bulletins, catalogs, courses of study, 
curriculum guides, athletic game records, programs (for graduation, dra- 
matic, musical, and athletic events), licenses, certificates, textbooks, ex- 
aminations, report cards, pictures, drawings, maps, letters, diaries, auto- 
biographies, teacher and pupil personnel files, samples of student work,: 
and recordings. 


Oral testimony. Included here are interviews with administrators, 
teachers and other school employees, students and relatives, school patrons 
or lay citizens, and members of governing bodies. 


Relics. Included in this category are buildings, furniture, teaching 
materials, equipment, murals, decorative pictures, textbooks, examinations, 
and samples of student work. 


Secondary Sources of Data 


Seconddry sources are the reports of a person who relates the testimony 
of an acthal witness of, or participant in, an event. The writer of the 
secondary source was not on the scene of the event, but merely reports 
what the person who was there said or wrote. Secondary sources of data 
are usually of limited worth for research purposes because of the errors 
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that may result when information is passed on from one person to another. 
Most history textbooks and encyclopedias are examples of secondary sources, 
for they are often several times removed. from the original, firsthand ac- 
count of events. ‘ 

Some types of material may be secondary sources for some purposes 
and primary sources for another. For example, a high school textbook in 
American history is ordinarily a secondary source. But if one were making 
a study of the changing emphasis on nationalism in high school American 
history textbooks, the book would be a primary document or source of 
data. : 


HISTORICAL CRITICISM 


It has been noted that the historian does not often use the method of direct 
observation. Past events cannot be repeated at will. Because the historian 
must get much of the data from the reports of those who witnessed or 
participated in these events, the data must be carefully analyzed to sift the 
true from the false, irrelevant, or misleading. 

Trustworthy, usable data in historical research are known as historical 
evidence. That body of validated information can be accepted as a trust- 
worthy and proper basis for the testing and interpretation of a hypothesis. 
Historical evidence is derived from historical data by the process of criti- 
cism, which is of two types: external and internal. 


External Criticism 


External criticism establishes the authenticity or genuineness of data. Is the 
relic or document a true one rather than a forgery, a counterfeit, or a 
hoax? Various tests of genuineness may be employed. 

Establishing the age or authorship of documents may require intricate 
tests of signature, handwriting, script, type, spelling, language usage, doc- 
umentation, knowledge available at the time, and consistency with what is 
known. It may involve physical and chemical tests of ink, paint, paper, 
parchment, cloth, stone, metals, or wood. Are these elements consistent 
with known facts about the person, the knowledge available, and the tech- 
nology of the period in which the remain or the document originated? 


Internal Criticism 


After the authenticity of historical documents or relics has been established, 
there is still the problem of evaluating their accuracy or worth. Although 
they may be genuine, do they reveal a true picture? What of the writers 
or creators? Were they competent, honest, unbiased, and actually ac- 
quainted with the facts, or were they too antagonistic or too sympathetic 
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to give a true picture? Did they have any motives for distorting the account? 
Were they subject to pressure, fear, or vanity? How long after the event 
did they make a record of their testimony, and were they able to remember 
accurately what happened? Were they in agreement with other competent 
witnesses? 3 

These questions are often difficult to answer, but the historian must 
be sure that the data are authentic and accurate. Only then may he or she 
introduce them as historical evidence, worthy of serious consideration. 

The following examples describe ways in which evidence is tested for 
authenticity. The first is an example of historical criticism of a scholarly 
type, carried on by scientists and biblical scholars, in which historic docu- 
ments were proven to be genuine. 


The Dead Sea Scrolls. One of the most interesting and significant 
historical discoveries of the past century was the finding of the Dead Sea 
Scrolls. This collection of ancient manuscripts was discovered in 1947 by 
a group of Bedouins of the Ta'amere tribe. Five leather scrolls were found, 
sealed in tall earthenware jars in the Qumran caves near Aim Feshkha, on 
the northwest shore of the Dead Sea (Davies, 1956).? 

The Bedouins took the scrolls to Metropolitan Mar Athanesius Yeshue 
Samuel, of St. Mark's monastery in Jerusalem, who purchased them after 
discovering that they were written in ancient Hebrew. A consultation with 
biblical scholars confirmed the fact that they were very old and possibly 
valuable. They were later purchased by Professor Sukenik, an archaeologist 
of Hebrew University at Jerusalem, who began to translate them. He also 
had portions of the scrolls photographed to send to other biblical scholars 
for evaluation. Upon examining some of the photographs, Dr. William F. 
Albright of Johns Hopkins University pronounced them "the greatest man- 
uscript discovery of modern times." 

A systematic search of the Wadi Qumran area caves in 1952 yielded 
other leather scrolls, many manuscript fragments, and two additional scrolls 
of copper that were so completely oxidized that they could not be unrolled 
without being destroyed. By 1956, scientists at the University of Manches- 
ter, England, had devised a method of passing a spindle through the scrolls, 
spraying them with aircraft glue, baking them, and then sawing them across 
their rolled-up length to yield strips which could be photographed. 

The origin, the age, and the historic value of the scrolls have been 
questioned. By careful and systematic external and internal criticism, how- 
ever, certain facts have been established and are quite generally accepted 
by biblical scholars and scientists. 


? From The Meaning of the Dead Sea Scrolls (New York: New American Lib: 
Literature, 1956), p. 9. Used with permission of the publisher. og cad 
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The scrolls are very old, probably dating back to the first century A.D. 
They are written in ancient Hebrew and probably originated in a pre- 
Christian monastery of one of tlie Jewish: sects. The writings contain two 
versions (one complete and one incomplete) of the Book of Isaiah, a com- 
mentary or Midrash on the Book of Habakkuk, a set of rules of the ancient 
Jewish monastery, a collection of about twenty psalms similar to those of 
the Old Testament, and several scrolls of apocalyptic writings, similar to 
the Book of Revelation. 

The contents of the copper scrolls and other fragments have now 
been translated. It is possible that more scrolls and writings may be dis- 
covered in the area, and it is likely that these ancient documents may throw 
new light on the Bible and the origins of Christianity. 

It is interesting to note how these documents were authenticated, 
dated, and evaluated by: 


l. Paleography, an analysis of the Hebrew alphabet forms used. These 
written characters were similar to those observed in other documents 
known to have been written in the first century, 

2. A radiocarbon test of the age of the linen scroll covering conducted 
by the Institute of Nuclear Research at the University of Chicago. All 
organic matter contains radiocarbon 14, which is introduced by the 
interaction of cosmic rays from outer space with the nitrogen in the 
earth’s atmosphere. The radioactivity constantly introduced through- 
out the life of the specimen ceases at death and disintegrates at a 
constant known rate. At the time of death, all organic matter yields 
15.3 disintegrations per minute per gram of carbon content. The 
number of disintegrations is reduced by one-half after 5568 years, , 
plus or minus 30 years. By measuring disintegrations by using a Geiger- 
type counter; it is possible to estimate the age of specimens within 
reasonable limits of accuracy, Through use of this technique, the date 
of the scrolls was estimated at A.D. 33, plus or minus 200 years. 

3. Careful examination of the pottery form in which the scrolls were 
sealed. These jars, precisely shaped to fit the manuscripts, were the 
type commonly used during the first century. 

4. Examination of coins found in the caves with the scrolls. These dated 
Roman coins provided convincing evidence of the age of the scrolls. 

5. Translation of the scrolls. When translated, the scrolls compared to 
other writings, both biblical and nonbiblical, of known antiquity. 


Although external criticism has now produced convincing evidence 
of the genuineness and age of the Dead Sea Scrolls, internal criticism of 
their validity and relevance will be pursued by biblical scholars for many 
years to come and may provide many new hypotheses concerning biblical 
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writings and the early history of Christianity and the pre-Christian Jewish 
sects. 

Modern approaches to historical research have applied advanced tech- 
nology, emphasizing the usefulness of both qualitative and quantitative 
data. As we-have seen in this example, researchers employed the radio- 
carbon 14 test to verify the authenticity of the scrolls. The next example 
illustrates the use of the computer in archaeological and historical research. 


Stonehenge (Hanging Stones). For centuries historians and archaeol- 
ogists have debated the origin and purpose of Stonehenge, a curious ar- 
rangement of stones and archways, each weighing more than 40 tons, 
located on the Salisbury Plain about 90 miles southwest of London. From 
the beginning of recorded history, writers have speculated about the stones. 
Their construction and arrangement have been attributed to many tribes 
and national groups who invaded or inhabited England. Modern radio- 
carbon dating of a deer antler found in the stone fill seems to date their 
erection at about 1900 to 1600 B.c. Their purpose has been explained in 
many legends—a city of the dead, a place of human sacrifice, a temple of 
the sun, a pagan cathedral, and a Druid ceremonial place. 

More recently some scientists and historians have suggested that 
Stonehenge was a type of astronomical computer calendar used by early 
Britons who were apparently sophisticated enough to compute the position 
of the sun and the moon at their various stages. Using an IBM 704 com- 
puter, Gerald S. Hawkins, an astronomer at the Smithsonian Astrophysical 
Observatory at Cambridge, Massachusetts, entered into the computer 240 
stone alignments, translated into celestial declinations. Accomplishing in 
less than a minute a task that would have required more than 4 months of 
human calculator activity, the computer compared the alignments with the 
precise sun/moon extreme positions as of 1500 B.c. and indicated that they 
matched with amazing accuracy. 

Hawkins suggests that the stone arrangements may have been created 
for several possible reasons: They made a calendar that would be useful 
for planting crops; they helped to create and maintain priestly power, by 
enabling the priest to call out the people to see the rise and setting of the 
midsummer sun and moon over the heel stone and midwinter sunset through 


the great trilithon; or possibly they served as an intellectual exercise. Hawk- 
ins concludes: 


In any case, for whatever reasons those Stonehenge builders built as they did, 
their final completed creation was a marvel. As intricately aligned as an 
interlocking series of astronomical instruments (which indeed it was) and yet 
architecturally perfectly simple, in function subtle and elaborate, in appear- 
ance stark, imposing, awesome, Stonehenge was a thing of surpassing inge- 
nuity of design, variety of usefulness and grandeur—in concept and con- 


tite an eighth wonder of the world. (Hawkins & White, 1966, pp. 117- 
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This interesting historical-archaeological controversy illustrates the 


use of sophisticated computer technology to test a hypothesis. 


Examples of Topics for Educational Historical Study 


Brickman (1982) provides a number of possible topics by types of historical 
research in education and an example for each. We repeat his list here: 


I. 
2 


3: 


Á 


T9; 


20. 


PERIOD. "Education during the First Half of the Fifteenth Century." 
GEOGRAPHICAL REGION. "German Education under Frederick 
the Great." 

EDUCATIONAL LEVEL. “The Secondary Schools of Ancient Rome.” 
INSTITUTION. “Amherst College in the Nineteenth Century.” 
BIOGRAPHY. “Bronson Alcott as an Educator.” Biographical detail, 
as such, is of less importance for term-report purposes than an ex- 
position of the man’s educational ideas, work, and influence. 
INNOVATIONS. “Three Decades of Audio-Visual Education.” 
PHILOSOPHY. “Changing Concepts of American Higher Education 
in the Nineteenth Century.” 

METHODOLOGY. “Herbartianism in American Educational Prac- 
tice.” 

CURRICULUM. “The Subject of Rhetoric in Ancient Greece.” 
PERSONNEL. “The Role of the Teacher during the Renaissance.” 
CHILDREN. “Changing Attitudes toward Corporal Punishment of 
Children in the United States.” t 

LEGISLATION. “Compulsory School Attendance Laws in Prussia 
During the Eighteenth Century.” 

MATERIALS. “The Evolution of American School Readers, 1700- 
1830.” 

NONSCHOOL AGENCIES. “The Development of the Library in 
Nineteenth-century America.” 

ORGANIZATIONS. “History of the Public School Society of New 
York.” 

FINANCE. “Methods of School Taxation in Pennsylvania, 1820-1880.” 
ARCHITECTURE. “The Evolution of the School Building in Illi- 
nois.” 

ADMINISTRATION. “The Rise of the State Superintendency of 
Schools,” 

LITERATURE. “A Century of Educational Periodicals in the United 
States.” 

INFLUENCE. “The Influence of Rousseau upon Pestalozzi.” 
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91. REPUTATION. "The Reception of Horace Mann's Educational 1de— == = 
in Latin America.” " = 
22. COMPARISON. “A Comparative Study of Renaissance Theories — 
the Education of the Prince." 
23, TEXTBOOK ANALYSIS. "A Study of the Treatment of Primit mc 
Education in Textbooks in Educational History" (pp. 5-6)? 
Obviously, these topics are too broad for a student project, and ===> 
some cases, would probably take most of a career, The processes of dele = = = ^ 
itation and hypothesis formation are needed to make these topics usefæ = 3.» -— 


WRITING THE HISTORICAL REPORT 


i No less challenging than research itself is the writing of the report, whe = <— == 


calls for creativity in addition to the qualities of imagination and resour—<— <=> — 
fulness already illustrated. It is an extremely difficult task to take of = «—— = = 


seemingly disparate pieces of information and synthesize them into a me === = œ — 


ingful whole. Research reports should be written in a dignified and objec m ~~ <= 
style. However, the historian is permitted a little more freedom in reportm wœ s—— . 
Hockett suggests that “the historian is not condemned to a bald, pl m =œ , 
unattractive style" and that “for the sake of relieving the monotony <> ss 
statement after statement of bare facts, it is permissible, now and them _ t< 
indulge in a bit of color." He concludes, however, by warning that "ab»——»9 ~~> «= 
all, embellishments must never become a first aim, or be allowed to lm 3 a«—38 < 
or distort the truth" (Hockett, 1948, p. 139). 

An evaluation of graduate students’ historical-research projects g— —— w» _ 


, erally reveals one or more of the following faults: 


1, Problem too broadly stated 


2. Tendency to use easy-to-find secondary sources of data rather te — 
sufficient primary sources, which are harder to locate but usually mm &— E. 
trustworthy e 

3, Inadequate historical criticism of data because of failure to estam & x 
authenticity of sources and trustworthiness of data. For example, tæ sa a 
is often a tendency to accept the truth of a statement if several — X ec 
servers agree. It is possible that one may have influenced the ot g _—- 
or that all were influenced by the same inaccurate source of in-&— == 
mation. — 

4. Poor logical analysis resulting from: 


a. Oversimplification—failure to recognize the fact that causes 


* Used with the permission of Emeritus, Inc., publisher. 


SUMMARY 
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events are more often multiple and complex than single and 
simple 

b. Overgeneralization on the basis of insufficient evidence, and false 
reasoning by analogy, basing conclusions upon superficial simi- 
larities of situations 

c. Failure to interpret words and expressions in the light of their 
accepted meaning in an earlier period 

d. Failure to distinguish between significant facts in a situation and 
those that are irrelevant or unimportant 

€. Failure to consider the documents in the context of their time, 
that is, the existing beliefs, biases, and so forth. 

5. Expression of personal bias, as revealed by statements lifted out of 
context for purposes of persuasion, assuming too generous or un- 
critical an attitude toward a person or idea (or being too unfriendly 
or critical), excessive admiration for the past (sometimes known as the 
"old oaken bucket" delusion), or an equally unrealistic admiration for 
the new or contemporary, assuming that all change represents prog- 
ress 

6. Poor reporting in a style that is dull and colorless, too flowery or 
flippant, too persuasive or of the "soap-box" type, or improper in 
usage 


It is apparent that historical research is difficult and demanding. The 
gathering of historical evidence requires long hours of careful examination 
of such documents as court records, records of legislative bodies, letters, 
diaries, official minutes of organizations, or other primary sources of data. 
Historical research may involve traveling to distant places to examine the 
necessary documents or relics. In fact, any significant historical study would 
make demands that few students have the time, financial resources, pa- 
tience, or expertise to meet. For these reasons, good historical studies are 
not often attempted for the purpose of meeting academic degree require- 
ments. 


History, the meaningful record of human achievement, helps us to under- 
stand the present and, to some extent, to predict the future. Historical 
research is the application of scientific method to the description and anal- 
ysis of past events. 

Historians ordinarily draw their data from the observations and ex- 
perience of others. Because they are not likely to have been at the scene 
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EXERCISES 


of the event, they must use logical inferences to supplement what is prob- 
ably an incomplete account. 

Primary sources may be "unconscious" testimony, not intended to be 
left as a record—relics or remains such as bones, fossils, clothing, food, 
utensils, weapons, coins, and art objects are useful. Conscious testimony, 
in the form of records or documents, is another primary source of infor- 
mation—examples are constitutions, laws, court decisions, of ficial minutes, 
autobiographies, letters, contracts, wills, certificates, newspaper and mag- 
azine accounts, films, recordings, and research reports. 

Historical criticism is the evaluation of primary data. External criticism 
is concerned with the authenticity or genuineness of remains or documents, 
and internal criticism is concerned with the trustworthiness or veracity of 
materials, The accounts of the Dead Sea Scrolls and Stonehenge illustrate 
the processes of historical criticism. 

The historical research studies of graduate students often reveal se- 


rious limitations. Frequently encountered are such faults as stating the 


problem too broadly, inadequate primary sources of data, unskillful his- 
torical criticism, poor logical analysis of data, personal bias, and ineffective 


reporting. 


1. Write a proposal for a historical study in a local setting. You may select a 
community, school, church, religious or ethnic group, or individual. State an 
appropriate title, present your hypothesis, indicate the primary sources of data 
that you would search, and tell how you would evaluate the authenticity and 
validity of your data. 

2. Select a thesis of the historical type from the university library and analyze it 
in terms of : 

a, hypothesis proposed or questions raised 

primary and secondary sources of data used 

external and internal criticism employed 

logica! analysis of data relationships 

soundness of conclusions 

documentation 


=o apg 
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ASSESSMENT, 
EVALUATION, 
AND DESCRIPTIVE 
RESEARCH 


A descriptive study describes and interprets what is. It is concerned with 
conditions or relationships that exist, opinions that are held, processes that 
are going on, effects that are evident, or trends that are developing. It is 
primarily concerned with the present, although it often considers past 
events and influences as they relate to current conditions. 

The term descriptive study conceals an important distinction, for not 
all descriptive studies fall into the category of research. In Chapter 1 the 
similarities and differences between assessment, evaluation, and research 
were briefly discussed. We will restate those similarities and differences in 
this discussion of descriptive studies. 

Assessment describes the status of a phenomenon at a particular time. 
It describes without value judgment a situation that prevails; it attempts 
no explanation of underlying reasons and makes no recommendations for 
action. It may deal with prevailing opinion, knowledge, practices, or con- 
ditions. As it is ordinarily used in education, assessment describes the prog- 
ress students have made toward educational goals at a particular time. For 
example, in the National Assessment of Education Progress program, the 
data are gathered by a testing program and a sampling procedure in such 
a way that no individual is tested over the entire test battery. It is not 
designed to determine the effectiveness of a particular process or program 
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but merely to estimate the degree of achievement of a large number of 
ihdividuals who have been exposed to a great variety of educational and 
environmental influences. It does not generally provide recommendations, 
but there may be some implied judgment on the satisfactoriness of the 
situation or the fulfillment of society's expectations. 

Evaluation is a process used to determine what has happened during 
a given activity or in an institution. The purpose of evaluation is to see if 
a given program is working, an institution is successful according to the 
goals set for it, or the original intent is being successfully carried out. To 
assessment, evaluation adds the ingredient of value judgment of the social 
utility, desirability, or effectiveness of a process, product, or program, and 
it sometimes includes a recommendation for some course of action. School 
surveys are usually evaluation studies; educational products and programs 
are examined to determine their effectiveness in meeting accepted objec- 
tives, often with recommendations for constructive action. 

Descriptive research, sometimes known as nonexperimental or corre- 
lational research, deals with the relationships between variables, the testing 
of hypotheses, and the development of generalizations, principles, or the- 
ories that have universal validity. It is concerned with functional relation- 
ships. The expectation is that if variable A is systematically associated with 
variable B, prediction of future phenomena may be possible and the results 
may suggest additional or competing hypotheses to test. 

In carrying out a descriptive research project, in contrast to an ex- 
periment, the researcher does not manipulate the variable, decide who 
receives the treatment, or arrange for events to happen. In fact, the events 
that are observed and described would have happened even though there 
had been no observation or analysis. Descriptive research also involves 
events that have already taken place and may be related to a present con- 
dition. 

The method of descriptive research is particularly appropriate in the 
behavioral sciences because many of the types of behavior that interest the 
researcher cannot be arranged in a realistic setting. Introducing significant 
variables may be harmful or threatening to human subjects. Ethical con- 
siderations often preclude exposing human subjects to harmful manipu- 
lation. For example, it would be unthinkable for an experimenter to ran- 
domly decide who should smoke cigarettes and who should not smoke 
them for the purpose of studying the effect of smoking on cancer, heart 
disease, or other illnesses thought to be caused by cigarette smoke. Similarly, 
to deliberately arrange auto accidents, except when manikins are used, in 
order to evaluate the effectiveness of seat belts or other restraints in pre- 
venting serious injury would be absurd. 

Although many experimental studies of human behavior can be ap- 
propriately carried out both in the laboratory and in the field, the prevailing 
research method of the behavioral sciences is descriptive. Under the con- . 
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ditions that naturally occur in the home, the classroom, the recreational 
center, the office, or the factory, human behavior can be systematically 
examined and analyzed. 

The many similarities between these types of descriptive studies may 
have tended to cloud the distinctions between them. They are all charac- | 
terized by disciplined inquiry, which requires expertise, objectivity, and | 
careful execution. They all develop knowledge, adding to what is already | 
known. They use similar techniques of observation, description, and anal- | 
ysis. The differences between them lie in the motivation of the investigator, 
the treatment of the data, the nature of the possible conclusions, and the 
use of/the findings. The critical distinctions are that the three types of 
studies have different purposes and, therefore, approach the problem dif- 
ferently and that only descriptive research studies lead to generalizations 
beyond the given sample and situation. 

It is also possible for a single study to have multiple purposes. For 
instance, a study may evaluate the success/failure of an innovative program 1 
and also include sufficient controls to qualify as a descriptive research study. 
Similarly, an assessment study may include elements that result in descrip- 
tive research too. 

Examples of these three types of descriptive studies are presented 
next. It is important to keep in mind that, while these examples are pre- 
sented to illustrate each individual type of study (assessment, evaluation, 
or descriptive research), they are not mutually exclusive. That is, for ex- 
ample, surveys are also used in descriptive research and case studies are 
also used in assessment studies. 


ASSESSMENT STUDIES 


The Survey 


The survey method gathers data from a relatively large number of cases 
at a particular time. It is not concerned with characteristics of individuals 
as individuals. It is concerned with the generalized statistics that result when | 
data are abstracted from a number of individual cases. It is essentially cross- | 
sectional. 

Ninety-four percent of American homes have at least one television 
set. About three out of five students who enter the American secondary 
school remain to graduate. Fifty-six percent of adult Americans voted in 
the 1972 presidential election. The average American consumes about 103 
pounds of refined sugar annually. The ratio of female births to male births 
in the United States in 1974 was 946 to 1000. The population of Illinois, 
according to the 1980 census, was 11,426,518. Data like these result from 
many types of surveys. Each statement pictures a prevailing condition at a 
particular time. 
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In analyzing political, social, or economic conditions, one of the first 
steps is to get the facts about the situation—or a picture of conditions that 
prevail or that are developing. These data may be gathered from surveys 
of the entire population. Others are inferred from a study of a sample 
group, carefully selected from the total population. And at times, the survey 
may describe a limited population which is the only group under consid- 
eration. 

The survey is an important type of study. It must not be confused 
with the mere clerical routine of gathering and tabulating figures. It in- 
volves a clearly defined problem and definite objectives. It requires expert 
and imaginative planning, careful analysis and interpretation of the data 

_ gathered, and logical and skillful reporting of the findings. 


Social Surveys 


In the late 1930s a significant social survey was directed by the Swedish 
sociologist Gunnar Myrdal and sponsored by the Carnegie Foundation. 
Myrdal and his staff of researchers made a comprehensive analysis of the 
social, political, and economic life of black persons in the United States, 
yielding a great mass of data on race relations in America (Myrdal, 1944). 

The late Alfred Kinsey (1948) of Indiana University made a com- 
prehensive survey of the sexual behavior of the human male, based on data 
gathered from more than 12,000 cases. His second study (Kinsey, 1953) 
of the behavior of the human female followed later. Although these studies 
have raised considerable controversy, they represent a scientific approach 
to the study of an important social problem and have many implications 
for jurists, legislators, social workers, and educators. 

Witty (1967) has studied the television viewing habits of school chil- 
dren, and has published annual reports on his investigations. These studies 
were conducted in the Chicago area and indicate the amount of time de- 
voted to viewing and the program preferences of elementary and secondary 
students, their parents, and their teachers. Witty attempted to relate tele- 
vision viewing to intelligence, reading habits, academic achievement, and 
other factors. 

Shaw and McKay (1942) conducted a study of juvenile delinquency 
in Chicago yielding significant data on the nature and extent of delinquency 
in large urban communities. 

Lang and Kahn (1986) examined special education teacher estimates 
of their students' criminal acts and crime victimizations. The data indicated 
that special education students seem to be victimized in the same way as 
others but to a greater degree. This preliminary study led to Lang's (1987) 
dissertation, an experiment aimed at reducing the rate of victimization of 
mentally retarded students. 

The National Safety Council conducts surveys on the nature, extent, 
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and causes of automobile accidents in all parts of the United States. State 
high school athletic associations conduct surveys on the nature and extent 
of athletic injuries in member schools. 


Public Opinlon Surveys 


In our culture, where so many opinions on controversial subjects are ex- 
pressed by well-organized special interest groups, it is important to find 
out what the people think. Without a means of polling public opinion, the 
views of only the highly organized minorities are effectively presented 
through the printed page, radio, and television. ! 

How dò people feel about legalized abortion, the foreign aid program, 
busing to achieve racial integration in the public schools, or the adequacy 
of the public schools? What candidate do they intend to vote for in the 
next election? Such questions can be partially answered by means of the 
public opinion survey. Many research agencies carry on these surveys and 
report their findings in magazines and in syndicated articles in daily news- 

apers. 
NES Since it would be impracticable or even impossible to get an expression 
of opinion from every person, sampling techniques are employed in such 
a way that the resulting opinions of a limited number of people can be 
used to infer the reactions of the entire population. 

The names Gallup, Roper, and Harris are familiar to newspaper read- 
ers in connection with public opinion surveys. These surveys of opinion 
are frequently analyzed and reported by such classifications as age groups, 
sex, educational level, occupation, income level, political affiliation, or area 
of residence. Researchers are aware of the existence of many publics, or 
segments of the public, who may hold conflicting points of view. This 
further analysis of opinion by subgroups adds meaning to the analysis of 
public opinion in general. 

Those who conduct opinion polls have developed more sophisticated 
methods of determining public attitudes through more precise sampling 
procedures and by profiting from errors that plagued early efforts. In 
prediction of voter behavior several well-known polls have proved to be 
poor estimators of election results. 

In 1936, a prominent poll with a sample of over 2 million voters 
predicted the election of Alfred Landon over President Roosevelt by nearly 
15 percentage points. The primary reason for this failure in prediction was 
the poll's sampling procedure. The sample was taken from telephone di- 
rectories and automobile registration lists which did not adequately rep- 
resent poor persons, who in this election voted in unprecedented numbers. 
Gallup, on the other hand, correctly predicted that Roosevelt would win, 
using a new procedure, quota sampling, in which various components of the 
population are included in the sample in the same proportion that they 
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are represented in the population. However, there are problems with this 
procedure, which resulted in Gallup and others being wrong in 1948 (Bab- 
bie, 1973). 

In the 1948 election campaign most polls predicted the election of 
Thomas E. Dewey over President Truman. This. time the pollsters were 
wrong, perhaps partly because of the sampling procedure and partly be- 
cause the polls were taken too far before the election despite a trend toward 
Truman throughout the campaign. Had the survey been made just prior 
to election day, a more accurate prediction might have resulted. In addition, 
most survey researchers (including pollsters) use probability sampling today 
instead of quota sampling. This results in all members ofa given population 
having the same probability of being chosen for the sample. In the 1968 
election the predictions of both Gallup and Harris polls were less than 2 
percentage points away from Richard Nixon's actual percent of the vote 
with samples of only about 2000 voters. This accuracy was possible due to 
the use of probability sampling (Babbie, 1973). 

In addition to the limitations suggested, there is a hazard of careless 
responses, given in an offhand way, that are sometimes at variance with 
the more serious opinions that are expressed as actual decisions. 

Since 1969 the Gallup organization has conducted an annual nation- 
wide opinion poll of public attitudes toward education. Using a stratified 
cluster sample of 1500 or more individuals over 18 years of age, the data 
have been gathered by personal interviews from seven geographic areas 
and four size-of-community categories. The responses were analyzed by 
age, sex, race, occupation, income level, political affiliation, and level of 
education. A wide range of problem areas has been considered: In the 
1975 poll such problem areas confronting education were the use of drugs 
and alcohol; programs on drugs or alcohol; behavior standards in the 
schools; policies on suspension from school; work required of students, 
including amount of homework; requirements for graduation from high 
school; federal aid to public schools; the nongraded school program; open 
education; alternative schools; job training: right of teachers to strike; text- 
book censorship; and the role of the school principal as part of management 
(Elam,-1979). The 1982 poll indicated the public's clear support for edu- 
cation. Education was ranked first among twelve funding categories con- 
sidered in the survey—above health care, welfare, and military defense- 
with 55 percent selecting public education as one of their first three choices 
(Nation at Risk, 1983, p. 17). 


National Assessment of Educational Progress 


The National Assessment of Educational Progress was the first nationwide, 
comprehensive survey of educational achievement to be conducted in the 
United States. Originally financed by the Carnegie Foundation and the 
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Fund for the Advancement of Education, with a supporting grant from 
the U.S. Office of Education, the Committee on Assessing the Progress of 
Education (CAPE) began its first survey in the spring of 1969. It gathered 
achievement test data by a sampling process such that no one individual 
was tested over the whole test battery or spent more than 40 minutes in 
the process. Achievement was assessed every 3 years in four age groups 
(9, 13, 17, and young adults between 26 and 35), in four geographical areas 
(Northeast, Southeast, Central, and West), for four types of communities 
(large city, urban fringe, rural, and small city), and for several socioeco- 
nomic levels and ethnic groups. 

Achievement has been assessed in art, reading, writing, social studies, 
science, mathematics, literature, citizenship, and music. Comparisons be- 
tween individuals, schools, or school systems have never been made. 

The agency now conducting the assessment is the National Assessment 

| of Educational Progress (NAEP), financed by the National Center for Ed- 
ucational Statistics, a division of the Department of Education. Periodic 
reports are provided for educators, interested lay adults, and for the gen- 
eral public through press releases to periodicals. 


International Assessment 


The International Association for the Evaluation of Educational Achieve- 
ment, with headquarters in Stockholm, Sweden, has been carrying on an 

assessment program in a number of countries since 1964, The first study, 
The International Study of Achievement in Mathematics (Torsten, 1967), com- 
pared achievement in twelve countries: Austria, Belgium, England, Fin- 
land, France, West Germany, Israel, Japan, the Netherlands, Scotland, 
Sweden, and the United States. Short answer and multiple choice tests were 
administered to 13-year-olds and to students in their last year of the upper 
secondary schools, prior to university entrance. More than 132,000 pupils 
and 5000 schools were involved in the survey. Japanese students excelled 
above all others, regardless of their socioeconomic status, and United States 
students ranked near the bottom. 

Although the purpose of assessment is not to compare school systems, 
the data lead observers to make such comparisons. Critics of the first as- 
sessment pointed out the inappropriateness of comparing 17-year-olds in 
the United States, where more than 75 percent are enrolled in secondary 
schools, with 17-year-olds in other countries in which those enrolled in 
upper secondary schools comprise a small, highly selected population. 

More recent assessments reveal that, although 10 percent of the top 
United States students surpassed similar groups in all other countries in 
dere in science they occupied seventh place (Hechinger & Hechinger, 

4). 

Other assessments have been carried out and the number of partic- 

ipating countries has been increased to twenty-two. 
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Activity Analysis 


The analysis of the activities or processes that an individual is called upon 
to perform is important, both in industry and in various types of social 
agencies. This process of analysis is appropriate in any field of work and 
at all levels of responsibility. It is useful in the industrial plant, where needed 
skills and competencies of thousands of jobs are carefully studied, jobs 
ranging in complexity from unskilled laborer to plant manager. 

In school systems the roles of the superintendent, the principal, the 
teacher, and the custodian have been carefully analyzed to discover what 
these individuals do and need to be able to do. The Commonwealth Teacher 
Training Study (Charters & Waples, 1929) described and analyzed the 
activities of several thousand teachers, and searched previous studies for 
opinions of writers on additional activities in which classroom teachers 
should engage. A more recent study (Morris, Crowson, Porter-Gehrie, & 
Hurwitz, 1984) described and analyzed the activities of school principals. 
This study is described in some detail later in this chapter as an example 
of ethnographic research. 

This type of analysis may yield valuable information that would prove 
useful in establishing 


l. The requirements for a particular job or position 

2. A program for the preparation or training of individuals for various 
jobs or positions 

3. An in-service program for improvement in job competence or for 
upgrading of individuals already employed 

4. Equitable wage or salary schedules for various jobs or positions. 


Trend Studies 


The trend study is an interesting application of the descriptive method. In 
essence it is based upon a longitudinal consideration of recorded data, 
indicating what has been happening in the past, what the present situation 
reveals, and on the basis of these data, what is likely to happen in the 
future. For example, if the population in an area shows consistent growth 
over a period of time, one might predict that by a certain date in the future 
the population will reach a given level. These assumptions are based upon 
the likelihood that the factors producing the change or growth will continue 
to exert their influence in the future. The trend study points to conclusions 
reached by the combined methods of historical and descriptive analysis, 
and is illustrated by Problems and Outlook of Small Private Liberal Arts Colleges: 
Report to the Congress of the United States by the Comptroller General (1978). In 
response to a questionnaire sent to 332 institutions, 283 furnished data on 
facility construction, loan repayments, enrollment, the effectiveness of 
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methods used to attract more students, financial aid provided, and the 
general financial health of their institutions. 1 

Based upon past and present experience, such influences as the growth 
of the community college. the effect of inflation on operating costs, tuition, 
living expenses and fees, and the decline in the number of college-age 
students were projected for the years 1978 to 1985 and their impact upon 
the financial stability of the small liberal arts college assessed. 

The following trend study topics would also be appropriate: 


l. The Growing Participation of Women in Intercollegiate Sports Pro- 
grams 

2. Trends in the Methods of Financial Support of Public Education 

3. The Growth of Black Student Enrollment in Graduate Study Pro- 
grams 


4. The Minimum Competency Requirement Movement in American 
Secondary Education 


EVALUATION STUDIES 


School Surveys 


What has traditionally been called a school survey is usually an assessment 
and evaluation study. Its purpose is to gather detailed information to be 
used as a basis for judging the effectiveness of the instructional facilities, 
curriculum, teaching and supervisory personnel, and financial resources 
in terms of best practices and standards in education. For example, profes- 
siona! and regional accrediting agencies send visitation teams to gather data 
on the characteristics of the institution seeking accreditation. Usually, fol- 
lowing a self-evaluation by the school staff, the visiting educators evaluate 
the institution's characteristics on the basis of. agency guidelines. 

* Many city, township, and county school systems have been studied by 
this method for the purpose of determining status and adequacy. These 
survey-evaluations are sometimes carried on by an agency of a university 
in the area. Frequently a large part of the data is gathered by local educators, 
with the university staff providing direction and advisory services. 


Program Evaluation 


The most common use of evaluation is to determine the effectiveness of a 
program and sometimes the organization. The school surveys described above 
are evaluations only of the organization. Program evaluations, while often 
including the organization, focus primarily on program effectiveness re- 
sults. As Kaufman and Thomas (1980) put it: 
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Evaluation deals with results, intended and unintended. The questions asked 
during evaluation are usually the same. Regardless of the context, evaluation 
seeks to answer the following questions: 
1. What are the goals and objectives of the organization? 
2. What should be the goals and objectives of the organization? 
3. What results were intended by the program, project, activity, or 
organization? 
4. What results were obtained by the program, project, activity, or 
organization? 
5. What were the value and usefulness of the methods and means used 
to achieve the results? 
6. How well was the program, project, activity, or organization admin- 
istered and managed? 
7. What, if anything, about the program, project, activity, or organi- 
zation should be changed? 


8. What, if anything, about the program, project, activity, or organi- 
zation should be continued? 


9. Should the organization, project, program, or activity exist at all? 


These questions are basic. They probe the issue of activities and the worth 
of these activities in terms of what they accomplished. 

Evaluation is more than testing or measuring; it includes asking and an- 
swering basic questions about efforts and results. (pp. 1-2)! 


There are a number of evaluation models that evaluators use. Some 
models ate actually research approaches to evaluation. Ruttman (1977) 
used the term evaluation research to describe evaluation procedures that use 
rigorous research methodology. Other models are less rigorous. The model 
selected should depend on the purpose for the evaluation. Kaufman and 
Thomas (1980) describe eight possible models. 


ASSESSMENT AND EVALUATION IN PROBLEM SOLVING 


In solving a problem or charting a course of action, several sorts of infor- 
mation may be needed. These data may be gathered through assessment 
and evaluation methods. 

The first ‘type of information is based upon present conditions. Where 
are we now? From what point do we start? These data may be gathered 
by a systematic description and analysis of all the important aspects of the 
present situation. 

The second type of information involves what we may want. In what 
direction may we go? What conditions are desirable or are considered to 
represent best practice? This clarification of objectives or goals may come 


! Used with the permission of the authors. 
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from a study of what we think we want, possibly resulting from a study of 
conditions existing elsewhere, or of what experts consider to be adequate 
or desirable. 

The third type of information is concerned with how to get there. This 
analysis may involve finding out about the experience of others who have 
been involved in similar situations. It may involve the opinions of experts, 
who presumably know best how to reach the goal. 

Some studies emphasize only one of these aspects of problem solving. 
Others may deal with two, or even three, of the elements. Although a study 
does not necessarily embrace all the steps necessary for the solution of a 
problem it may make a valuable contribution by clarifying only one of the 
necessary steps— from description of present status to the charting of the 
path to the goal. 

Assessment and evaluation methods may supply some or all of the 
needed information. An example will illustrate how they can be used to 
help solve an educational problem. ` 

Washington Township has a school building problem. Its present 
educational facilities seem inadequate, and if present developments con- 
tinue, conditions may be much worse in the future. The patrons and ed- 
ucational leaders in the community. know that a problem exists, but they 
realize that this vague awareness does not provide a sound basis for action. 
Three steps are necessary to provide such a basis. 

The first step involves a systematic analysis of present conditions. How 
many school-age children are there in the township? How many children 
are of preschool age? Where do they live? How many classrooms now exist? 
How adequate are they? What is the average class size? How are these 
present buildings located in relation to residential housing? How adequate 
are the facilities for food, library, health, and recreational services? What 
is the present annual budget? How is it related to the tax rate and the 
ability of the community to provide adequate educational facilities? 

The second step projects goals for the future. What will the school 
population be in 5, 10, or 20 years? Where will the children live? How 
many buildings and classrooms will be needed? What provisions should be 
made for special school services, for libraries, cafeterias, gymnasiums, and | 
play areas to take care of expected educational demands? 

Step three considers how to reach those goals which have been es- | 
tablished by the analysis of step two. Among the questions to be answered { 
are the following: Should existing facilities be expanded or new buildings 1 
constructed? If new buildings are needed, what kind should be provided? 
Should schools be designed for grades 1 through 8, or should 6-year ele- 
mentary schools and separate 2- or 3-year junior high schools be provided? 
How will the money be raised? When and how much should the tax rate 
be increased? When should the construction program get underway? 

Many of the answers to the questions raised in step three will be arrived 


| 
| 
| 
i 
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at by analysis of practices of other townships, the expressed opinions of 
school patrons and local educational leaders, and the opinions of experts 
in the areas of school buildings, school organization, community planning, . 
and public finance. Of course, this analysis of school building needs is but 
one phase of the larger educational problem of providing an adequate 
educational program for tomorrow's children. There remain problems of 
curriculum, pupil transportation, and school personnel. These problems 
can also be attacked by using similar methods of assessment and evaluation. 


THE FOLLOW-UP STUDY 


The follow-up study investigates individuals who have left an institution 
after having completed a program, a treatment, or a course of study. The 
study is concerned with what has happened to them, and what has been 
the impact upon them of the institution and its program. By examining 
their status or seeking their opinions, one may get some idea of the ade- 
quacy or inadequacy of the institution's program. Which courses, experi- 
ences, or treatments proved to be of value? Which proved to be ineffective 
or of limited value? Studies of this type enable an institution to evaluate 
various aspects of its program in light of actual results. 

Dillon's (1949) study of early school leavers has yielded information 
that may lead to the improvement of the curriculum, guidance services, 
administrative procedures, and thus the holding power of the American 
secondary school. 

Project Talent (U.S. Office of Education, 1965) was an educational 
survey conducted by the University of Pittsburgh with support from the 
Cooperative Research Program of the U.S. Office of Education, the Na- 
tional Institutes of Health, the National Science Foundation, and the De- 
partment of Defense. The survey consisted of the administration of a 2- 
day battery of aptitude, ability, and achievement tests, and inventories of 
the background characteristics of 440,000 students enrolled in 1353 sec- 
ondary schools in all parts of the United States. Five basic purposes of the 
survey were stated: 


1. Toobtainan inventory of the capacities and potentialities of American 
youth 

2. To establish a set of standards for educational and psychological meas- 
urement 

3. To provide a comprehensive counseling guide indicating patterns of 
Career success 

4. To provide information on how youth choose their life work 

5. To provide better understanding of the educational experiences which 
prepare students for their life work. 
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In addition to the testing program, questionnaire follow-up studies 
have been conducted, and are planned at regular intervals, to relate the 
information gathered to patterns of aptitude and ability required by various 
types of occupations. The vast amount of data stored in the data bank, 
now available in the computer files, will make significant educational re- 
search possible and may provide a basis for possible changes in the edu- 
cational patterns of American secondary schools. 

Project Talent, described as an example of an educational survey, also 
provides an illustration of a follow-up study. One phase of the longitudinal 
study reported by Combs and Cooley (1968), involved the follow-up of the 
ninth-grade group who failed to complete the high school program. This 
group, which represented a random sample of the ninth-grade secondary 
school population, provided an estimate of the characteristics of the drop- 
out population, compared with those of a random sample of students who 
graduated but did not enter a junior college or 4-year institution of higher 
learning. These two samples were compared on a number of characteristics, 
such as academic achievement, participation in extracurricular activities, 
work experiences, hobbies, contacts with school counselors, and self-re- 
ported personal qualities. 

The students who graduated scored significantly higher on most of 
the characteristics, except self-reported qualities of leadership and impul- 
siveness, One unusual finding indicated that the dropouts earned as much 
as those who had finished high school and had been earning it longer. It 
was pointed vut, however, that the economic advantages of finishing high 
school could not be adequately evaluated until later in life. 

T Project Talent, funded by the National Institute of Education, main- 
tained contact with the original students and has completed the eleventh- 
year follow-up survey. Many of the students expressed dissatisfaction with 
their schooling and regretted that they had not gone on to college or 
vocational school and that they had married too early. More than half still 
live within 30 miles of their high. schools, a surprising observation in a 
society that is believed to be extremely mobile. The more mobile half were 


4 
the high academic achievers. Eighty percent of the men, but only 65 percent 
of the women, expressed satisfaction with their jobs in meeting their long- 
range goals. 
DESCRIPTIVE RESEARCH | 
The examples discussed up to this point in the chapter have been desig- | 


nated as assessment studies and evaluation studies. Descriptive research 
studies have all of the following characteristics which distinguish them from | 
the type previously described and from those described in the next chapter. : 
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l. They involve hypothesis formulation and testing. 

2. They use the logical methods of inductive-deductive reasoning to 
arrive at generalizations. 

3. They often employ methods of randomization so that error may be 
estimated when inferring population characteristics from observations 
of samples. 

4. The variables and procedures are described as accurately and com- 
pletely as possible so that the study can be replicated by other re- 
searchers. 

5. They are nonexperimental, for they deal with the relationships be- 
tween nonmanipulated variables in a natural rather than artificial 
setting. Since the events or conditions have already occurred or exist, 
the researcher selects the relevant variables for an analysis of their 
relationships. 


Quantitative and Qualitative Research 


Descriptive research can be divided into two broad categories: quantitative 
research and qualitative research. Quantitative research consists of those stud- 
ies in which the data concerned can be analyzed in terms of numbers. An 
example of quantitative research might be a study comparing two methods 
of teaching reading to first-grade children, because the data used to de- 
termine which method is more successful will be a test score. The average 
score of the children receiving one method will be compared to the average 
score of children receiving the other method. This example would be an 
experimental study (discussed in Chapter 5) if the experimenter randomly 
assigned the children to the methods, or a descriptive study if the children 
had already received the instruction and the experimenter was merely 
examining the results after the fact (see ex post facto studies later in this 
chapter). In either case the study would be considered quantitative. 

Research can also be qualitative, that is, it can describe events, persons, 
and so forth scientifically without the use of numerical data. A study con- 
sisting of interviews of mothers of handicapped infants to determine how 
their lives and beliefs were affected by the birth of their handicapped 
children is an example of qualitative research. Such a study would carefully 
and logically analyze the responses of the mothers and report those re- 
sponses that are consistent as well as areas of disagreement. 

Each of these types of research has advantages and disadvantages. In 
quantitative research, the experimenter has carefully planned the study 
including the tests, or other data collection instruments, to be used. Each 
subject is studied in an identical manner and there is little roorn for human 
bias to create problems with the data. Qualitative research is also planned 
carefully. Yet, qualitative studies leave open the possibility to change, to 
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ask different questions, and to go in the direction that the observation may 
lead the experimenter. Quantitative research is based more directly on its 
original plans and its results are more readily analyzed and interpreted. 
Qualitative research is more open and responsive to its subject. Both types 
of research are valid and useful. They are also not mutually exclusive. It 
is possible for a single investigation to use both methods. For instance, a 
study of mothers of handicapped infants might include interviews, as men- 
tioned earlier, and measures of religiosity and knowledge regarding their 
child's handicap. Such a study would interpret the interview data qualita- 
tively and the measures of religiosity and knowledge quantitatively. While 
studies combining these approaches are rare and difficult, the benefits can 
outweigh the difficulties. 

Of the types of descriptive research that follow, the first three, doc- 
ument analysis, case studies, and ethnographic studies, are types of qual- 
itative research. The ex post facto studies in this chapter and the experimental 
and quasi-experimental designs in Chapter 5 are quantitative research. 


Document or Content Analysis 


Documents are an important source of data in many areas of investigation, 
and the methods of analysis are similar to those used by historians. The 
major difference between this type of research and historical research is 
that, while historical research often uses document analysis, it deals solely 
with past events. When document analysis is used as descriptive research, 
current documents and issues are the foci. The analysis is concerned with 
the explanation of the status of some phenomenon at a particular time or 
its development over a period of time. The activity may be classified as 
descriptive research, for problem identification, hypothesis formulation, 
sampling, and systematic observation of variable relationships may lead to 
generalizations. It serves a useful purpose in adding knowledge to fields 
of inquiry and in explaining certain social events. Its application to edu- 
cational research is suggested in some of the studies listed as examples. 

. In documentary analysis, the following may be used as sources of 
data: records, reports, printed forms, letters, autobiographies, diaries, com- 
positions; themes or other academic work, books, periodicals, bulletins or 
catalogues, syllabi, court decisions, pictures, films, and cartoons. 

When using documentary sources, one must bear in mind that data 
appearing in print are not necessarily trustworthy. Documents used in 
descriptive research must be subjected to the same careful types of criticism 
employed by the historian. Not only is the authenticity of the document 
important, but the validity of its contents is crucial. It is the researcher's 
obligation to establish the trustworthiness of all data that he or she draws 
from documentary sources. 

The following purposes may be served through documentary analysis 
(examples of actual studies are given as illustrations). The first five purposes 


1 
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are of a descriptive research nature while the subsequent three are historical 
in nature: i 


E 


To describe prevailing practices or conditions. 

Entrance Requirements of Ohio Colleges as Revealed by an Analysis of College 
Bulletins 

Criteria for Primary Pupil Evaluation Used on Marion County Report Cards 

To discover the relative importance of, or interest in, certain topics 

or problems. 

Public Information on Education as Measured by Newspaper Coverage 
in Three Indianapolis Daily Newspapers during the Month of December, 
1958 

Statistical Concepts Presented in College Textbooks in Educational Research 
Published since 1°40 

To discover the level of difficulty of presentation in textbooks or in 

other publications. 

The Vocabulary Level of Intermediate Science Textbooks 

Abstract Concepts Found in First-grade Readers 

To evaluate bias, prejudice, or propaganda in textbook presentation. 

The Soviet Union as Presented in High School History Textbooks 

The Free Enterprise System as Pictured in High School Social Problems Text- 
books 

Racial and Religious Stereotypes in Junior High School Literature Textbooks 

To analyze types of errors in students’ work. 

Typing Errors of First Semester Typing Students at Shortridge High School 

Errors in English Usage Found in cann of Application for Admission to the 
University of Wisconsin 

To analyze the use of symbols representing persons, political parties 

or institutions, countries, or points of view. 

Great Britain as a Symbol, as Represented in New York City Newspaper 
Cartoons in the Decade, 1930-1940, 

The New Dealer as Depicted in the American Press from 1932 to 1942 

To identify the literary style, concepts, or beliefs of a writer. 

Shakespeare’s Use of the Metaphor 

Alexander Campbell's Concept of the Trinity, as Revealed in His Sermons 

John Dewey's Interpretation of Education as Growth 

To explain the possible causal factors related to some outcome, action, 

or event. 

The Effect of Media Coverage upon the Outcome of the 1976 Presidential 
Election 

The Influence of Newspaper Editorials upon the Action of the State Assembly 
on Sales Tax Legislation 
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Content or document analysis should serve a useful purpose in yield- 
ing information that is helpful in evaluating or explaining social or edu- 
cational practices. Since there are so many significant areas to be investi- 
gated, setting üp studies for the pure joy of counting and tabulating has 
little justification. “The Uses of Shall and Will in the Spectator Papers" or 
“The Use of Too, Meaning Also, in the Works of Keats" would seem to 
add little useful knowledge to the field of literature. 


The Case Study 


The case study is a way of organizing social data for the purpose of viewing 
social reality. It examines a social unit as a whole. The unit may be a person, 
a family, a social group, a social institution, or a community. The purpose 
is to understand the life cycle or an important part of the life cycle of the 
unit. The case study probes deeply and analyzes interactions between the 
factors that explain present status or that influence change or growth. It 
is a longitudinal approach, showing development over a period of time. 

The element of typicalness, rather than uniqueness, is the focus of 
attention, for an emphasis upon uniqueness would preclude scientific ab- 
straction and generalization of findings. As Bromley (1986) notes, "A 'case' 
is not only about a ‘person’ but also about that ‘kind of person’. A case is 
an exemplar of, perhaps even a prototype for, a category of individuals” 
(p. 295). Thus, the selection of the subject of the case study needs to be 
done carefully in order to assure that he or she is typical of those to whom 
we wish to generalize. 

Data may be gathered by a wide variety of methods, including 


1. Observation by the researcher or his or her informants of physical 
characteristics, social qualities, or behavior 

2. Interviews with the subject(s), relatives, friends, teachers, counselors, 
and others 

3. Questionnaires, opinionnaires, psychological tests and inventories 


4. Recorded data from newspapers, schools, courts, clinics, government 
agencies, or other sources. 


A single case study emphasizes analysis in depth. Though it may be 
fruitful in developing hypotheses to be tested, it is not directed toward 
broad generalizations. One cannot generalize from a number (N) of 1. To 
the extent that a single case may represent an atypical situation, the ob- 
servation is sound. But if the objective analysis of an adequate sample of 
cases leads researchers to consistent observations of significant variable 
relationships, hypotheses may be confirmed, leading to valid generaliza- 
tions. 

The individual case study has been a time-honored procedure in the 
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field of medicine and medical research. Sigmund Freud was a pioneer in 
using case study methods in the field of psychiatry. In an effort to treat 
his psychoneurotic patients, he began to discover consistent patterns of 
experience. Under his careful probing, patients recalled long-forgotten, 
traumatic incidents in their childhood and youth. Freud hypothesized that 
these incidents probably explained their neurotic behavior (Strachey, 1964). 

His famous case history of Sergei Petrov, "the Wolf Man," published 
in 1918 under the title From the History of an Infantile Neurosis, is one of the 
classic examples of Freud's use of the case study. He believed that these 
case studies confirmed his hypothesis, leading to psychoanalysis as a method 
of treatment. He also used them to demonstrate how theoretical models 
could be used to provide concrete examples. i 

Case studies are not confined to the study of individuals and their 
behavioral characteristics. Case studies have been made of all types of 
communities, from hamlet to great metropolis, and of all types of individ- 
uals—alcoholics, drug addicts, juvenile delinquents, migratory workers, 
sharecroppers, industrial workers, members of professions, executives, army 
wives, trailer court residents, members of social classes, Quakers, Amish, 
members of other religious sects and denominations, black Americans, 
American Indians, Chinese-Americans, Hispanics, and many other social 
and ethnic groups. Such institutions as colleges, churches, corrective insti- 
tutions, welfare agencies, fraternal organizations, and business groups have 
been studied as cases. These studies have been conducted for the purpose 
of understanding the culture and the development of variable relationships. 

For example, a community study is a thorough observation and anal- 
ysis of a group of people living together in a particular geographic location 
in a corporate way. The study deals with such elements of community life 
as location, appearance, prevailing economic activity, climate and natural 
resources, historical development, mode of life, social structure, goals or 
life values and patterns, the individuals or power groups that exert the 
dominant influence, and the impact of the outside world. It also-evaluates 
the social institutions that meet the basic human needs of health, protection, 
making a living, education, religious expression, and recreation. 

The early community studies of Lynd and Lynd are well known. The 
first, Middletown (1929), and the second, Middletown in Transition (1937), 
described the way of life in Muncie, Indiana, a typical midwestern, average- 
size city, tracing its development from the gas boom of the 1890s through 
World War I, the prosperity of the twenties, and the depression of the 
thirties. West (1945) described the nature of a very small community in 
the Ozark region in Plainville, USA. Sherman and Henry (1923) studied 
the way of life in five “hollow” communities, hidden in the Blue Ridge 
Mountains, in Hollow Folk. 

Some community studies have singled out particular aspects for spe- 
cial investigation. Drake and Cayton (1945) described life in the black sec- 
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tion of Chicago in Black Metropolis. Hollingshead (1949) portrayed the life 
of adolescents in a small Illinois community in Elmtown’s Youth. Warner and 
Lunt (1941) developed a hypothesis of social class structure in a New Eng- 
land community in their study of Newburyport, Massachusetts, in Social 
Life in a Modern Community. Lucas (1970) compared the way of life in three 
Canadian communities in Minetown, Milltown, Railtown: Life in Canadian 
Communities of Single Industry. 

Although the case study is a useful method of organizing research 
observations, certain precautions should be considered: 


l. The method may look deceptively simple. To use it effectively, the 
researcher must be thoroughly familiar with existing theoretical 
knowledge of the field of inquiry, and skillful in isolating the signif- 
icant variables from many that are irrelevant. There is a tendency to 
select variables because of their spectacular nature rather than for 
their crucial significance. 

2. Subjective bias is a constant threat to objective data-gathering and 
analysis. The danger of selecting variable relationships based upon 
preconceived convictions and the apparent consistency of a too limited 
sample of observations may lead the researcher to an unwarranted 
feeling of certainty about the validity of his or her conclusions. 

3. Effects may be wrongly attributed to factors that are merely associated 
rather than cause-and-effect related. While the case study process is 
susceptible to this post hoc fallacy, it is also a hazard associated with 
other types of nonexperimental studies. 


Ethnographic Studies 


Ethnography, sometimes known as cultural anthropology or more recently 
as naturalistic inquiry, is a method of field study observation that became 
popular in the latter part of the nineteenth century. It has continued to 
show significant development, suggesting promising techniques for the 
study of behavior in an educational situation. In its early application, it 
consisted of participant observation, conversation, and the use of inform- 
ants to study the cultural characteristics of primitive people: African, South 
Sea Island, and American Indian tribes. These groups were small in num- 
ber, geographically and culturally isolated, with little specialization in social 
function, and with simple economies and technology. Such cultural features 
as language, marriage and family life, child-rearing practices, religious 
beliefs and practices, social relations and rules of conduct, political insti- 
tutions, and methods of production were analyzed. 

The data gathered consisted of observation of patterns of action, ) 
verbal and nonverbal interaction between members of the tribe as well as 
between the subjects and the researcher and his or her informants, and 
the examination of whatever records or artifacts were available. 
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Many early studies were subsequently criticized on the grounds that 
the anthropologist spent too little time among the people of the tribe to 
get more than a superficial view, didn't learn the native language and had 
to depend too much on the reports of poorly trained informants, and relied 
too much on his or her own cultural perspective, reaching ethnocentric, 
judgmental conclusions that resulted in stereotyped theories of the devel- 
opment of the primitive society. 

Later investigators realized that studies of this type would be invalid 
unless the observer 


1. Lived for a much more extensive period of time among the tribe and 
became an integrated member of the social group 

2. Learned the native language, enabling him or her to develop the 
sensitivity to think, feel, and interpret observations in terms of the 
tribe's concepts, feelings, and values, while at the same time supple- 
menting his or her own objective judgment in interpreting observa- 
tions 

3. Trained his or her informants to systematically record field data in 
their own language and cultural perspective. 


This refinement of participant observation resulted in more objective 
and valid observation and analysis. Some studies were directed toward the 
examination of the total way of life of a group. Other studies singled out 
a particular phase of the culture for intensive analysis, taking into account 
those elements that were relevant to the problem. 

In her classic study, Coming of Age in Samoa, Mead (1928) observed 
the development of 53 adolescent girls in a permissive Samoan society. She 
concluded that there were no differences in the physical processes of ad- 
olescent growth between Samoan and American girls: The differences were 
differences in response. The difficulties of this period of development, a 
troublesome feature of American life, do not occur in Samoa. She attributed 
the difference to Samoa's more homogeneous culture, a single set of re- 
ligious and moral beliefs, and a wider kinship network that conferred 
authority and affection. The difficulties of American girls were attributed 
to cultural restraints, not nature. 

Many of the time-honored techniques of the ethnographic study in- 
volving integration into the group and observation are being applied to 
psychology and education, as well as anthropology and sociology. An ex- 
cellent example of this methodology applied to an educational issue is a 
recent study of school principals. Morris, Crowson, Porter-Gehrie, and 
Hurwitz (1984) were interested in determining exactly what principals ac- 
tually do and how much time is spent on those activities. Their procedure 
was to have each principal observed for up to 12 full work days. The 
observers followed the principal wherever he or she went. The authors 
"were interested in whom the principal interacted with and by what means 
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(verbal face to face, written word, telephone, etc.). We wanted to know 
which party initiated each interchange, whether it was planned or spon- 
taneous, how long it lasted, and where it took place. Most important, we 
wanted to follow the changing subject matters of these conversations, not 
only to see what topics consumed the principal's time but also to trace the 
rhythm of the principal's working hours" (Morris, et al., 1984, p. v).? One 
of the conclusions of this study was that principals usually spend less than 
half their work day in their offices, that they have a good deal of discretion 
in their decision-making, and that the principal's behavior "affects four 
distinct ‘constituencies’ ": teachers and students, parents and others in the 
community, superiors, and the principal him- or herself (Morris, et al., 
1984, p. v). 

The ethnographic study is a qualitative approach, employing few, if 
any, quantitative data-gathering instruments. Using the method of obser- 
vation, the researcher observes, listens to, and sometimes converses with 
the subjects in as free and natural an atmosphere as possible. The as- 
sumption is that the most important behavior of individuals in groups is a 
dynamic process of complex interactions and consists of more than a set 
of facts, statistics, or even discrete incidents. The strength of this kind of 
study lies in the observation of natural behavior in a real-life setting, free 
from the constraints of more conventional research procedures. 

Another assumption is that human behavior is influenced by the set- 
ting in which it occurs. The researcher rfiust understand that setting and 
the nature of the social structure; its traditions, values, and norms of be- 
havior. It is important to observe and interpret as an outside observer but 
also to observe and interpret in terms of the subjects—how they view the 
situation, how they interpret their own thoughts, words, and activities, as 
well as those of others in the group. The researcher gets inside the minds 
of the subjects, while at the same time interpreting the behavior from his 
or her own perspective. 

The relationship of researchers to their subjects is based upon trust 
and confidence. Researchers do not allow themselves to be aligned with 
either the authority figures or the subjects. A position of neutrality is es- 
sential to objective participant observation. 

Unlike conventional deductive quantitative research, participant ob- 
servers begin without preconceptions and hypotheses. Using inductive logic, 
they build their hypotheses as they are suggested by observations. They 
periodically reevaluate them on the basis of new observations, modifying 
them when they appear to be inconsistent with the evidence. They look 
for negative evidence to challenge their temporary hypotheses. In a sense, 
this type of research has the characteristics of a series of consecutive studies. 
Unlike the conventional research study, the interpretation is not deferred 


*Used with the permission of the authors and of Charles E. Merrill Publishing Co. 
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to the conclusion but is a constant ongoing process of testing tentative 
hypotheses against additional observations in a real situation. 

Ethnographic methods of research have been used to investigate such 
problems as: 


l. Student Leadership Roles in an Urban, Racially Integrated High School 
2. Pupil-Teacher Relationships in a Suburban Junior High School 

3. Social Relationships in a Class of Emotionally Disturbed Children 
4 


Changes in Attitudes and Behavior in a Drug Abuse Rehabilitation 
Center 


The Social Class Structure of a Florida, Cuban-American Community 
6. Staff-Parent Interactions in an Individualized Education Plan (IEP) 
Staffing 


g 


EX POST FACTO OR CAUSAL-COMPARATIVE STUDIES 


Descriptive research seeks to find answers to questions through the analysis 
of variable relationships. What factors seem to be associated with certain 
occurrences, outcomes, conditions, or types of behaviors? Because it is often 
impracticable or unethical to arrange occurrences, an analysis of past events 
or of already existing conditions may be the only feasible way to study 
causation. This type of research is usually referred to as ex post facto or 
causal-comparative research or, when correlational analyses are used, it may 
be referred to as correlational research. 

For example, one would not arrange automobile accidents in order 
to study their causes. The automobile industry, police departments, safety 
commissions, and insurance companies study the conditions associated with 
the accidents that have occurred. Such factors as mechanical faults or fail- 
ures, excessive speed, driving under the influence of alcohol, and others 
have been identified as causal. 

However, while studies of past events may be the only practicable'way 
to investigate certain problems, the researcher needs to be aware of the 
problems inherent in this type of research. The researcher must be cog- 
nizant of the fact that the information used in ex fost facto studies may be 
incomplete. That is, the researcher may not have sufficient information 
about all of the events and variables that were occurring at the time being 
studied. This lack of control or even of knowledge regarding what variables 
were controlled makes causal statements based upon this type of research 
very difficult to make. i 

Research on cigarette smoking has had a tremendous effect on society. 
Laws banning television advertising and cigarette smoking in certain areas 
resulted from the U.S. Surgeon General's reports (1964, 1979). These 
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TABLE 4-1 Expected Death Rates for Smokers 
EXPECTED OBSERVED MORTALITY 


UNDERLYING CAUSE OF DEATH DEATHS DEATHS RATIO 
Cancer of lung (162-3) 170.3 1,833 10.8 
Bronchitis and emphysema (502, 521.1) 89.5 546 6.1 
Cancer of larynx (161) 14.0 75 5.4 
Oral cancer (140-8) 37.0 152 4.1 
Cancer of esophagus (150) 33.7 113 34 
Stomach and duodenal ulcers (540, 541) 105.1 294 28 
Other circulatory diseases (451-68) 254.0 649 2.6 
Cirrhosis of liver (581) 169.2 379 2.2 
Cancer of bladder (181) 111.6 216 1.9 
Coronary artery disease (420) 6,430.7 11,177 1.7 
Other heart diseases (421-2, 430-4) 526.0 868 1.7 
Hypertensive heart (440-3) 409.2 631 1.5 
General arteriosclerosis (450) 210.7 310 1.5 
Cancer of kidney (180) 79.0 120 1.5 
All causes 15,653.9 23,223 1.68 


(Surgeon General's Report, 1964). 


reports compiled the research of epidemiologists on the effects of smoking 
on a person's health. Epidemiological research methods are used to study 
trends and incidences of disease and are descriptive in nature. The epi- 
demiological research on smoking included two types of descriptive meth- 
odology: retrospective studies relate personal histories with medical and 
mortality records; prospective studies follow a group of individuals for an 
indefinite period or until they die. The early studies, from 1939 to the 
early 1960s, were primarily retrospective. These studies found that persons 
who had died of lung cancer were more likely to have been cigarette smok- 
ers than nonsmokers. 

A number of prospective studies, begun in the 1950s, found a greater 
likelihood of a variety of health problems among smokers than nonsmokers. 
Table 4—1 (Table 2 in Chapter 4 of the U.S. Surgeon General's [1964] 
report) shows the expected number of deaths, based on the overall death 
rates for persons, the ages of the subjects, and the actual number of deaths 
for seven prospective studies combined. Mortality ratio is simply observed 
deaths divided by expected deaths. As the Surgeon General's report states: 


The mortality ratio for male cigarette smokers compared with non-smokers, 
for all causes of death taken together, is 1.68, representing a total death rate 
nearly 70 percent higher than for non-smokers. (This ratio includes death 
rates for diseases not listed in the table as well as the 14 disease categories 
shown.) 

In the combined results from the seven studies, the mortality ratio of cig- 
arette smokers over non-smokers was particularly high for a number of dis- 
eases: cancer of the lung (10.8), bronchitis and emphysema (6.1), cancer of 
the larynx (5.4), oral cancer (4.1), cancer of the esophagus (3.4), peptic ulcer 
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(2.8), and the group of other circulatory diseases (2.6). For coronary artery 
disease the mortality ratio was 1.7. 

Expressed in percentage-form, this is equivalent to a statement that for 
coronary artery disease, the leading cause of death in this country, the death 
rate is 70 percent higher for cigarette smokers. For chronic bronchitis and 
emphysema, which are among the leading causes of severe disability, the 
death rate for cigarette smokers is 500 percent higher than for non-smokers. 
For lung cancer, the most frequent site of cancer in men, the death rate is 
nearly 1,000 percent higher. (pp. 28—29) 


While this evidence appears overwhelming, it is not totally convincing 
by itself. Since the researchers could not randomly assign persons to the 
smoking and nonsmoking groups, it is possible that persons who decide to 
smoke are particularly nervous individuals and that it is their nervousness, 
not their smoking, that causes their greater incidence of illness and early 
death. Of course this research, along with chemical analyses indicating 
carcinogens in cigarette smoke and animal studies, is convincing to the vast 
majority of scientists and the public. 

Studies of juvenile delinquency may compare the social and educa- 
tional backgrounds of delinquents and nondelinquents. What factors, if 
any, were common to the nondelinquent group? Any factors common to 
one group, but not to the other, might serve as a possible explanation of 
the underlying causes of delinquency. 

Some efforts have been made to associate good or poor teaching with 
the type of educational institution in which the teachers prepared. Those 
studies have proved inconclusive, possibly for a number of reasons. In 
addition to the difficulty of finding valid and satisfactory criteria of good 
and poor teaching, many factors other than type of college attended seem 
to be significant. Such variables as quality of scholarship, socioeconomic 
status, personality qualities, types of nonschool experiences, attitudes to- 
ward the teaching profession, and a host of others have possible relevancy. 


Sesame Street studies. Minton (1975) studied the effect of viewing the 
children’s television program, “Sesame Street,” on the reading readiness 
of kindergarten children. Of three sample groups, a 1968, a 1969, and a 
1970 group, only the 1970 group had viewed the program. 


Reading Readiness and “Sesame Street” 
SAMPLE GROUP N WHITE BLACK SPANISH-SPEAKING 


1968 482 431 51 18 
1969 495 434 61 9 
1970 524 436 88 25 


à — 
From "Impact of Sesame Street on Reading Readiness" by J. M. Minton, Sociology of Education, 
1975, 48, 141—51. Reprinted by permission 
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Scores on the Metropolitan Reading Readiness Test battery, consisting 
of six subtests (word meaning, listening, matching, alphabet letter recog- 
nition, numbers, and copying text) were used to measure readiness. Using 
pretest-posttest design, the mean gain scores of the 1970 group were com- 
pared with those of the 1968 and 1969 groups. 

No significant differences at the 0.05 level were observed in total 
scores. On only one of the subtests, letter recognition, was a significant 
difference observed, favoring the 1970 group. In a classification by socio- 
economic status, advantaged children watched more and scored higher 
than disadvantaged children. The hypothesis that viewing "Sesame Street" 
would help to close the gap between advantaged and disadvantaged chil- 
dren was not supported; rather, the gap was widened. 

Anderson and Levin (1976) studied the effect of age on the viewing 
attention of small children to a 57-minute taped "Sesame Street" program, 
consisting of 41 bits, each ranging in length from 10 to 453 seconds. Six 
groups of five boys and five girls, ages 12, 18, 24, 30, 36, 42, and 48 months 
were observed by video tape recordings. In a viewing room, in the presence 
of parents, toys were provided as alternatives to viewing. The following 
observations were reported: 


l. Length of attention increased with age. The younger children ap- 
peared to be more interested in the toys and interacting with their 
mothers. 

2. Length of attention decreased as bit length increased. 

Attention to animals increased to 24 months but dropped therafter. 

4. Children showed more interest in the presence of women, lively music, 
puppets, peculiar voices, rhyming, repetition, and motion. 

5. Children showed less interest in the presence of adult men, animals, 
inactivity, and still drawings. 


po 


REPLICATION AND SECONDARY ANALYSIS 


Replication, a combination of the terms repeat and duplicate, is an important 
method of challenging or verifying the conclusions of a previous study. 
Using different subjects at a different time and in a different setting, ar- 
riving at conclusions that are consistent with those of the previous study 
would strengthen its conclusions and justify more confidence in its validity. 
Replication is essential to the development and verification of new gen- 
eralizations and theories. 

Another useful procedure, known as secondary analysis, consists of 
reanalyzing the data gathered by a previous investigator, and may involve 
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different hypotheses, different experimental designs, or different methods 
of statistical analysis. The subjects are the same and the data are the same. 
The difference is that of alternative methods of analysis. 

Secondary analysis has a number of advantages that commend its use: 


l. The new investigator may bring an objectivity, a fresh point of view, 
to the investigation and may think of better questions to be raised or 
hypotheses.to be tested, For example, the viewpoint of a psychologist 
rather than that of a sociologist (or vice versa) may find greater mean- 
ing in the data already available. 

2. Secondary analysis may bring greater expertise to the area of inves- 
tigation and greater skill in experimental design and statistical anal- 
ysis. 

3. The reanalysis would involve less expense in both time and money. 
Because the data are already available, a more modest appropriation 
of funds would be possible. It would not be necessary to intrude upon 
the time of subjects (teachers and students) whose primary activities 
had been diverted in the original investigation. 

4. Secondary analysis may provide useful experience for students of 
research methodology by enabling them to use real data, rather than 
simulated or inferior data, for the purposes of the exercise. 


Secondary analysis has played an important part in educational re- 
search. Probably no investigation has been subjected to as great a degree 
of secondary analysis as the Equality of Educational Opportunity study, de- 
scribed next. 

Equality of Educational Opportunity study. In 1964, the Congress of the 
United States passed the Civil Rights Act, which directed the United States 
Commissioner of Education to carry out a study of "the lack of educational 
opportunity by reason of race, color, religion, or national origin in public 
educational institutions at all levels in the United States, its territories and 
possessions, and the District of Columbia." 

This authorization assumed that educational opportunity for mem- 
bers of minority groups was unequal to that available for white students. 
This study was one of the largest of its type ever conducted. The report 
of its findings, commonly known as the Coleman Report, was titled Equality 
of Educational Opportunity (Coleman, et al., 1966). 

The nationwide investigation selected, by a two-stage probability sam- 
ple, 640,000 public school pupils in grades 1, 3, 6, 9, and 12, and 60,000 
teachers in more than 4000 schools. Data were also gathered from parents, 
school principals, school district superintendents, and prominent com- 
munity members. In addition, case studies of individual cities were con- 
ducted by educators, lawyers, and sociologists. For comparative purposes 
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the data were organized. by geographic location as northern, northern 
metropolitan, southern and southwestern, southern metropolitan, and mid- 
western and western. Individuals were classified as white, black, Asian, 
Indian, Mexican-American, and Puerto Rican. 

As much as possible, data-gathering instruments were checked for 
validity and reliability. Methods of data analysis included multiple corre- 
lation and factorial analysis of variance and covariance. 

Although it would not be feasible to present a detailed account of the 
findings of the study, a few of the major conclusions are included: 


|. The report rejected the assumption that the educational opportunities 
provided for minority children were unequal. There seemed to be 
little difference in almost all school facilities that would relate to equal- 
ity of opportunity. In some areas, minority schools seemed to be more 
adequate than predominantly white schools. , 

2. Family background, rather than the characteristics of the school, ap- 
peared to be the major influence on school achievement. It was ap- 
parent that, over the years, the school experience did little to narrow 
the initial achievement gap. 

3. The socioeconomic composition of the student body was more highly 
related to achievement than any school factor. ' 

4. The achievement level in rank order was white, Asian-American, 
American Indian, Mexican-American, Puerto Rican, and black. While 
white students scored significantly higher than any other group, Asian- 
Americans excelled in nonverbal and mathematics achievement. 

5. Inequalities of educational opportunity were more closely related to 
regional differences, rather than to differences between predomi- 
nantly black and white schools. Schools in the North, Midwest, and 

P West seemed to have better facilities than those in the South and 
Southwest. 

6. Social class differences within all groups appeared to be more signif- 

icant than the differences between ethnic groups: 


The Coleman Report has been subjected to criticism both by expe- 
rienced researchers and by members of special interest groups. The find- 
ings were unacceptable to some, who pointed out flaws in the gathering of 
data and their interpretation. Others found procedural defects in sampling 
and statistical analysis of the data. 

Of 900,000 pupils solicited, only about 640,000, or about two-thirds 
of the invited sample, were tested. Twenty-one metropolitan school districts 
refused to participate in the study, including such large cities as Boston, 
Chicago, Indianapolis, and Los Angeles. In addition, twenty-three other 
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school districts, who participated to a limited degree, refused to test their 
pupils. The provision for an equal number of white and nonwhite partic- 
ipants in the sample introduced a possible element of invalidity in the 
statistical analysis of the data. 

The questionnaires were criticized for their lack of what has been 
termed a “qualitative bite,” the effort to get beneath the surface for more 
meaningful responses. There was also a high degree of nonresponse to the 
questionnaires, particularly on some items of an emotional or controversial 
nature. For example, one-third of the principals failed to answer questions 
on the racial composition of their faculties, 

Some critics believed that the report did not make a highly significant 
contribution to education, but most agreed that it did stimulate interest in 
further research concerning the relationship of the family, the school, and 
the community. 

The fact that no previous study has generated so much controversy 
is not surprising considering the complexity of the problems involved and 
the sensitive nature of the issues. For example, both advocates and op- 
ponents of school busing viewed the data in the light of wi own estab- 
lished positions. 

A number of studies using vicontali analysis were authorized by 
various government agencies, special commissions, and philanthropic foun- 
dations. Using the Coleman Report data, various aspects of the problem 
were examined more closely, using different statistical procedures and 
raising different questions. Some confined their investigations to data re- 
lating to a single geographic area while others considered a wider range 
of data analysis. Helpful resumes of several of these studies are included 
in the publication, On Equality of Educational Opportunity, edited by Mosteller 
and Moynihan (1972). 


Meta-analysis. A relatively recent innovation that allows a researcher 
to systematically and statistically combine the findings of several previous 
studies is known as meta-analysis, research synthesis, or research integration. 
There are a number of quantitative techniques, ranging from fairly simple 
to quite complex, by which the data from previously published studies can 
be combined. Glass (1978) and his colleagues (Glass, Smith, & Barton, 1979), 
have developed and described some of these techniques. Walberg (1986) 
discusses the relative advantages of the traditional review of the literature 
and the statistical research synthesis. He suggests that a combination of 
these approaches can be useful in estimating the effects of a number of 
studies. Walberg and his colleagues have conducted a number of studies 
using these techniques. See the special issue of Evaluation in Education, 1980, 
Vol. 4, pp. 1-142, edited by Walberg and Haertel, for a selection of these 
and other research integration efforts. 
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THE POST HOC FALLACY 


One of the most serious dangers of ex post facto-and causal-comparative 
research is the post hoc fallacy, the conclusion that because two factors go 
together one must be the cause and the other the effect. Because there 
seems to be a high relationship between the number of years of education 
completed and earned income, many educators have argued that staying 
in school will add x number of dollars of income over a period of tinie for 
each additional year of education completed. Although there may be such 
a relationship, it is also likely that some of the factors that influence young 
people to.seek. additional education are more important than the educa- 
tional. level completed. Such factors as socioeconomic status, persistence, 
desire, willingness to postpone-immediate gratification, and intelligence 
level are undoubtedly significant factors in vocational success. Staying in 
school may be:a symptom rather than the cause. 

: Some critics of cigarette-cancer research have:advanced a similar ar- 
gumept. The case.that they propose follows this line-of reason: Let us 
suppose: that certain individuals with a type of glandular imbalance have 
a tendency toward cancer. The imbalance induces a certain amount of 
, nervous tension. Because.excessive cigarette smoking is a type of nervous 
tension release, these individuals tend to be heavy smokers. The cancer 
could result from the glandular imbalance rather than from the smoking, 
which is only a symptom. This error of confusing symptoms or merely 
associated factors with cause could lead researchers to deduce a false cause- 
and-effect relationship. 

This illustration is not presented to discredit this type of cancer re- 
search. Substantial evidence does suggest a significant relationship. Labo- 
ratory experiments have supported a causal relationship between the coal- 
tar products that are distilled from cigarette combustion and malignant 
growth in animals. The association explanation, however; is one that should 
always be examined carefully. 

Ex post facto and causal-comparative research is widely and appropri- 
ately used, particularly in the behavioral sciences. In education, because it 
is impossible, impracticable, or unthinkable to manipulate such variables 
as aptitude, intelligence, personality traits, cultural deprivation, teacher 
competence, and some variables that might presentan unacceptable threat 
to human beings, this method will continue to be used. 

However, its limitations should be recognized: 


1, The independent variables cannot be manipulated. 


2. Subjects cannot be randomly, or otherwise, assigned to treatment 
groups. 


3. Causes are often multiple rather than single. 


SUMMARY 
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For these reasons scientists are reluctant to use the expression cause 
and effect in nonexperimental studies in which the variables have not been 
carefully manipulated. They prefer to observe that when variable A ap- 
pears, variable B is consistently associated; possibly for reasons not com- 
pletely understood or explained. 

Since there is a danger of confusing symptoms with causes, ex post 
facto research should test not just one hypothesis but other logical alternate 
or competing hypotheses as well. Properly employed and cautiously inter- 
preted, it. will continue to provide a useful methodology for the develop- 
ment of knowledge. 

Students who have completed a course in research methods should 
be sensitive to the operation of extraneous variables that threaten the va- 
lidity of conclusions. Glass (1968) cautions educators of the need for critical 
analysis of reported research. He cites a number of interesting examples 
of carelessly conducted studies that resulted in completely false conclusions. 
Unfortunately, these conclusions were accepted by gullible readers and 
widely reported in popular periodicals and some educational psychology 
textbooks. 

The authors trust that the experience of the introductory course in 
educational research will help students and educators to read research 
reports more carefully and to apply more rigorous standards of judgment. 


The term descriptive studies has been used to classify a number of different 
types of activity. This chapter points out the distinctions between three 
major categories: assessment, evaluation, and descriptive research. 

Assessment describes the status of a phenomenon at a particular time 
without value judgment, explanation of reasons or underlying causes, or 
recommendations for action. 

Evaluation adds to the description of status the element of value judg- 
ment, in terms of effectiveness, desirability, or social utility, and may suggest 
à course of action. No generalizations are extended beyond the situation 
evaluated. 

Descriptive research is concerned with the analysis of the relationships 
between nonmanipulated variables and the development of generalizations, 
extending its conclusions beyond the sample observed. 

Assessment types of studies described are surveys, public opinion 
polls, the National Assessment of Educational Progress, the International Assess- 
ment of Educational Achievement, activity analysis, and trend studies. 

Evaluation studies included are school surveys and follow-up studies. 
The application of evaluation findings to social problem solving is discussed. 
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Descriptive research studies include document or content analysis, 
case studies, community studies, ethnographic studies, and ex post facto or 
explanatory observational studies. These methods have been described and 
examples provided. The hazards of the post hoc fallacy have been empha- 
sized. 


EXERCISES | 
Jolavab ses rg, Why is it sometimes difficult to distinguish between an assessment study, an 
evaluation study, and a descriptive research project? Illustrate with an example. 
^ 2. Public opinion polls base their conclusions on a sample of approximately 1500 

! 'respondents. Is this an adequate sample for a nationwide survey? 

A 3. Ina 1974 study, the West Virginia State Department of Education reported that 
counties with the highest’ per-pupil expenditure were the counties with the 
highest level of academic achievement, and that this "shows for the first time 
the clearest possible relationship between student achievement and the amount 
of money invested in the public schools." Can you suggest several competing 
hypotheses that might account for high academic achievement? 

What is the difference between a study and a research project? 

5. In what ways does conducting longitudinal studies run the risk of the violation 
of confidentiality of personal information? 

6. How can a study of money and investment trends help you provide for your 
future financial security? 

7. Draw up a proposal for a follow-up study of your high school graduating class 
of 5 years ago. Indicate what information you believe would be helpful in im- 
proving the curriculum of the school. 

8. Of what value are the findings of the annual Gallup poll of public attitudes 
toward education? 

9. How could the survey type of study be helpful in arriving at solutions to the 
crime problem in large cities? 


> 
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EXPERIMENTAL 

AND 
QUASI-EXPERIMENTAL 
RESEARCH 


Experimental research provides a systematic and logical method for an- 
swering the question, "If this is done under carefully controlled conditions, 
what will happen?" Experimenters manipulate certain stimuli, treatments, 
or environmental conditions and observe how the condition or behavior 
of the subject is affected or changed. Their manipulation is deliberate and 
systematic. They must be aware of other factors that could influence the 
outcome and remove or control them so that they can establish a logical 
association between manipulated factors and observed effects. 

Experimentation provides a method of hypothesis testing. After ex- 
perimenters define a problem, they propose a tentative answer, or hy- 
pothesis. They test the hypothesis and confirm or disconfirm it in the light 
of the controlled variable relationship that they have observed. It is im- 
portant to note that the confirmation or rejection of the hypothesis is stated 
in terms of probability rather than certainty. 

Experimentation is the classic method of the science laboratory, where 
elements manipulated and effects observed can be controlled. It is the most 
sophisticated, exacting, and powerful method for discovering and devel- 
oping an organized body of knowledge. 

Although the experimental method finds its greatest utility in the 


Experimental and Quasi-Experimental Research 111 


laboratory, it has been effectively applied within nonlaboratory settings 
such as the classroom, where significant factors or variables can be con- 
trolled to some degree. The immediate purpose of experimentation is to 
predict events in the experimental setting. The ultimate purpose is to gen- 
eralize the variable relationships so that they may be applied outside the 
laboratory to a wider population of interest. 


EARLY EXPERIMENTATION 


The earliest assumptions of experimental research were based upon what 
was known as the law of the single variable. John Stuart Mill defined this 
principle. He stated five rules or canons that he believed would include all 
types of logical procedure required to establish order among controlled 
events. 

One of his canons, known as the method of difference, states: 


If an instance in which the phenomenon under investigation occurs, and an 
instance in which it does not occur have every circumstance in common save 
one, that one occurring only in the former, the circumstance in which alone 
the two instances differ is the effect, or the cause, or an indispensable part 
of the cause“of the phenomenon. (Mill, 1873, p. 222) 


In simpler language, if two situations are alike in every respect, and 
one element is added to one but not the other, any difference that develops 
is the effect of the added element; or, if two situations are alike in every 
respect, and one element is removed from one but not from the other, any 
difference that develops may be attributed to the subtracted element. 

The law of the single variable provided the basis for much early 
laboratory experimentation. In 1662, Robert Boyle, an Irish physicist, used 
this method in arriving at a principle upon which he formulated his law 
of gases: When temperature is held constant, the volume of an ideal gas 
is inversely proportional to the pressure exerted upon it. In other words, 
when pressure is raised, volume decreases; when pressure is lowered, vol- 
ume increases. In Boyle's Law, pressure is the single variable. 


Vy _ Pe 
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A little more than a century later, Jacques A. C. Charles, a French 
physicist, discovered a companion principle, now known as Charles’ Law. 
He observed that when the pressure was held constant, the volume of an 
ideal gas was directly proportional to the temperature. When temperature 
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is raised, volume increases; when temperature is lowered, volume de- 
creases. In Charles' Law, temperature is the single variable. 


Although the concept of the single variable proved useful in some 
areas of the physical sciences, it failed to provide a sound approach to 
experimentation in the behavioral sciences. Despite its appealing simplicity 
and apparent logic, it did not provide an adequate method for studying 
complex problems. It assumed a highly artificial and restricted relationship 
between single variables. Rarely, if ever, are human events the result of 
single causes. They are usually the result of the interaction of many vari- 
ables, and an attempt to limit variables so that one can be isolated and 
Observed proves impossible. ] 

The contributions of R. A. Fisher, first applied in agricultural ex- 
perimentation, have provided a much more effective way of conducting 
realistic experimentation in the behavioral sciences. His concept of achiev- 
ing preexperimental equating of conditions through random selection of 
subjects and random assignment of treatments, and his concepts of analysis 
of variance and analysis of covariance, made possible the study of complex 
interactions through factorial designs, in which the influence of more than 
one independent variable upon more than one dependent variable could 
be observed. Current uses of this type of design will be discussed more 
fully later in this chapter. 


EXPERIMENTAL AND CONTROL GROUPS 


An experiment involves the comparison of the effects of a particular treat- 
ment with that of a different treatment or of no treatment. In a simple 
conventional experiment, reference is usually made to an experimental group 
and to a control group. 

These groups are equated as nearly as possible. The experimental 
group is exposed to the influence of the factor under consideration; the 
control group is not. Observations are then made to determine what dif- 
ference appears or what change or modification occurs in the experimental 
as contrasted with the control group. 

_ Sometimes it is also necessary to-control for the effect of actually 
participating in an experiment. Medical researchers have long recognized 
that patients who receive any medication, regardless of its real efficacy, 


tend to feel better or perform more effectively. In "médical experiments, a 


harmless or inert substitute is administered to the control group to offset 
the psychological effect of medication. These substitutes, or placebos, are 
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indistinguishable from the real medication under investigation, and neither 
experimental nor control subjects know whether they are receiving the 
medication or the placebo. The effectiveness of the true medication is the 
difference between the effect of the medication and that of the placebo. 

What seems to be a similar psychological effect was recognized in a 
series of experiments at the Hawthorne Plant of the Western Electric Com- 
pany and originally published in 1933 (Mayo, 1960). The studies concerned 
the relationships between certain working conditions and worker output 
efficiency. Illumination was one of these manipulated experimental vari- 
ables. The researchers found that as light intensity was increased, worker 
output increased. After a certain peak was apparently reached, it was de- 
cided to see what effect the reduction of intensity of illumination would 
have. To the surprise of the researchers, as intensity was decreased by 
stages, output continued to increase. The researchers concluded that the 
attention given the workers and their awareness of participation in our 
experiment apparently were important motivating factors. From these stud- 
ies the term Hauthorne Effect was introduced into the psychological litera- 
ture. 

It has been commonly believed that this reactive effect of knowledge 
of participation in an experiment, the Hawthorne Effect, is similar to the 
medical placebo effect. Researchers have frequently devised nonmedical 
placebos to counteract this potential effect. One such device, used in con- 
nection with experiments involving the comparison of traditional teaching 
materials with new experimental materials, is to reprint the traditional, or 
control materials and label both these and the new, experimental materials: 
"Experimental Method." 

A group receiving a placebo is usually known as a placebo control 
group to distinguish it from the more common control group that receives 
nothing additional as a result of the study. 

Even when the subjects of a study are unlikely to know or care that 
they are participants in an experiment, it may be necessary to utilize a 
placebo control group. Research with severely and profoundly retarded 
children may result in increased time spent with the experimental group 
children over the control group children unless a placebo is introduced. 
An example is a study in which Kahn (1978) investigated the effect of a 
cognitive training program. Rather than have the usual control group, this 
study made sure that the nonexperimental group children received as much 
individual instruction as the experimental group children, albeit in areas 
other than the experimental treatment. Thus, this study used a placebo 
control group in order to assure that group differences were a result of 
the training procedure rather than additional attention. 

Experiments are not always characterized by a treatment-nontreat- 
ment comparison. Varying types, amounts, or degrees of the experimental 
factor may be applied to a number of groups. For example, in medical 
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research an experiment to test the effectiveness of a particular medication 
in reducing body temperature might involve administering a massive dos- 
age to one group, a normal dosage to a second, and a minimal dosage to 
a third. Because all the groups receive medication, there is no control group 
in the limited sense of the term, but control of the experimental factors 
and observation of their effects are essential elements. 

In educational research, varying types or degrees of an experimental 
factor might also be used with different groups. For instance, a researcher 
might compare three different methods of teaching a subject such as spell- 
ing. Or a researcher might wish to study the effect of class size on learning 
in a high school history course. Such a study might compare three classes 
of varying size, say 35, 30, and 25, to see which class did better. Of course, 
the researcher would have to be certain that all other factors (e.g., intel- 
ligence, prior knowledge, time of day, and length of instruction, etc.) were 
equated. 


VARIABLES 


Independent and Dependent Variables 


Variables are the conditions or characteristics that the experimenter ma- 
nipulates, controls, or observes. The independent variables are the conditions 
or characteristics that the experimenter manipulates or controls in his or 
her attempt to ascertain their relationship to observed phenomena. The 
dependent variables are the conditions or characteristics that appear, dis- 
appear, or change as the experimenter introduces, removes, or changes 
independent variables. 

In educational research an independeny variable may be a particular 
teaching method, a type of teaching material, a reward, or a period of 
exposure to a particular condition, or an attribute such as sex or level 
of intelligence. The dependent variable may be a test score, the number 
of errors, or measured speed in performing a task. Thus, the dependent 
variables are the measured changes in pupil performance attributable to 
the influence of the independent variables. 

There are two types of independent variables: treatment and organismic 
or attribute variables. Treatment variables are those factors that the exper- 
imenter manipulates and to which he or she assigns subjects. Attribute 
variables are those characteristics that cannot be altered by the experi- | 
menter. Such independent variables as age, sex, race, and intelligence level | 
have already been determined, but the experimenter can decide to include l 
them or remove them as variables to be studied. The question of whether 
8-year-old girls show greater reading achievement than 8-year-old boys is | 
an example of the use of an organismic variable, sex. The teaching pro- 
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cedure is the same for both groups so there is no treatment independent 
variable. 


Confounding Variables 


Confounding variables are those aspects of a study or sample that might 
influence the dependent variable (outcome measure) and whose effect may 
be confused with the effects of the independent variable. Confounding 
variables are of two types: intervening and extraneous variables. 


Intervening variables. In many types of behavioral research the rela- 
tionship between the independent and dependent variables is not a simple 
one of stimulus to response. Certain variables which cannot be controlled 
or measured directly may have an important effect upon the outcome. 
These modifying variables intervene between the cause and the effect. 

In a classroom language experiment a researcher is interested in de- 
termining the effect of immediate reinforcement upon learning the parts 
of speech. He or she suspects that certain factors or variables, other than 
the one being studied, immediate reinforcement, may be influencing the 
results, even though they cannot be observed directly. These factors, anx- 
iety, fatigue, motivation, for example, may be intervening variables. They 
are difficult to define in operational, observable, terms, but they cannot be 
ignored. Rather, they must be controlled as much as is feasible through 
the use of appropriate designs. 


Extraneous variables. Extraneous variables are those uncontrolled var- 
iables (i.e., variables not manipulated by the experimenter) that may have 
a significant influence upon the results of a study. Many research conclu- 
sions are questionable because of the influence of these extraneous vari- 
ables. 

In a widely publicized study, the effectiveness of three methods of 
social studies teaching was compared. Intact classes were used, and the 
researchers were unable to randomize or control such variables as teacher 
competence or enthusiasm, or the age, socioeconomic level, or academic 
ability of the student subjects. The criterion of effectiveness was achieve- 
ment, measured by scores on standardized tests. It would seem clear that 
the many extraneous variables precluded valid conclusions about the rel- 
ative effectiveness of the independent variables, which were teaching meth- 
ods. It should be noted that in order for an extraneous variable to confound 
the results of a study, it must be correlated strongly enough with both the 
independent and dependent variables that its influence can be mistaken 
for that of the independent variable. 

Although it is impossible to eliminate all extraneous variables, partic- 
ularly in classroom research, sound experimental design enables the re- 
searcher to largely neutralize their influence. 
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CONTROLLING EXTRANEOUS VARIABLES 


Variables that are of interest to the researcher can be controlled by building 
them into the study as independent variables. For instance, a researcher 
comparing two different reading programs may wish to control for the 
potentially confounding extraneous variable sex by making it an inde- 
pendent attribute variable and, thereby, investigating the effect of sex on 
the two different reading programs. 

Variables that are not of direct interest to the researcher may be 
removed or their influence minimized by several methods, which are dis- 
cussed in the following sections. 


Removing the variable. Variables may be controlled by eliminating them 
completely. Observer distraction may be removed by separating the ob- 
server from both experimental and control groups by a one-way glass 
partition. Some variables between subjects may be eliminated by selecting 
cases with uniform characteristics. Using only female subjects removes sex 
as a variable but thereby reduces the generalization from the study to only 
females. 


Randomization. Randomization involves pure chance selection and as- 
signment of subjects to experimental and control groups for a limited 
supply of available subjects. Random selection was discussed in Chapter 1. 
Here we are referring to random assignment, the method by which every- 
one already selected for the sample has an equal chance of being assigned 
to the various treatment conditions (e.g., experimental and control). 

If two groups are involved, randomization could be achieved by toss- 
ing a coin, assigning a subject to one group if heads appeared, to the other 
if the toss were tails. When more than two groups are involved, dice or a 
table of random numbers could be used, 

Randomization provides the most effective method of eliminating 
systematic bias and of minimizing the effect of extraneous variables. The 
principle is based upon the assumption that through random assignment, 
differences between groups result only from the operation of probability 
or chance. These differences are known as sampling error or error variance, 
and their magnitude can be established by the researcher. 

In an experiment, differences in the dependent variables that may 
be attributed to the effect of the independent variables are known as ex- 
perimental variance. The significance of an experiment may be tested by 
comparing experimental variance with error variance. If at the conclusion 
of the experiment the differences between the experimental and control 
groups are too great to attribute to error variance, it may be assumed that 
these differences are attributable to experimental variance. This process is 
described in detail in Chapter 9. 
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Matching cases. When randomization is not feasible (e.g., there are 
too few subjects), selecting pairs or sets of individuals with identical or 
nearly identical characteristics and assigning one of them to the experi- 
mental group and the other to the control group provides another method 
of control. This method is limited by the difficulty of matching on more 
than one variable. It is also likely that some individuals will be excluded 
from the experiment if a matching subject is not available. Matching is not 
considered satisfactory unless the members of the pairs or sets are then 
randomly assigned to the treatment groups, a method known as matched 
randomization. 


Balancing cases, or group matching. Balancing cases consists of assign- 
ing subjects to experimental and control groups in such a way that the 
means and the variances of the groups are as nearly equal as possible. 
Because identical balancing of groups is impossible, the researcher must 
decide how much departure from equality can be tolerated without loss of 
satisfactory control. This method also presents a similar difficulty noted in 
the matching method; namely, the difficulty of equating groups on the 
basis of more than one characteristic or variable. 


Analysis of covariance. This method permits the experimenter to elim- 
inate initial differences on several variables between the experimental and 
control groups by statistical methods. The use of pretest mean scores as 
covariants is considered preferable to the conventional matching of groups, 
Analysis of covariance is a rather complicated statistical procedure, beyond 
the scope of this elementary treatment. For a complete discussion, readers 
may wish to consult Glass and Hopkins (1984), Hays (1981), Kerlinger 
(1986), Kirk (1982), or Winer (1971). 


OPERATIONAL DEFINITIONS OF VARIABLES 


Such variables as giftedness, academic achievement, and creativity are con- 
ceptualizations that are defined in dictionary terms. But because they can- 
not be observed directly, they are vague and ambiguous and provide a 
poor basis for identifying variables. Much more precise and unambiguous 
definitions of variables can be stated in operational form, which stipulates 
the operation by which they can be observed and measured. Giftedness 
could be operationally defined as a score two or more standard deviations 
above the mean on the Wechsler Adult Intelligence Scale, academic achieve- 
ment as à score on the 1973 edition of the Stanford Achievement Test, or 
creativity as a score on the Torrance Tests of Creative Thinking. When an 
operational definition is used, there is no doubt about what the researcher 


means. 
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To be useful, however, operational definitions must be based upon a 
theory that is generally recognized as valid. Operational terms do not always 
prove to be useful in describing variables, for they could conceivably be 
based upon irrelevant behavior. Defining degree of self-esteem in terms 
of the number of times an individual smiles per minute would not be a 
useful or realistic definition, even though such behavior could easily be 
observed and recorded. 


EXPERIMENTAL VALIDITY 


To make a significant contribution to the development of knowledge, an 
experiment must be valid. Campbell and Stanley (1966) described two types 
of experimental validity, internal validity and external validity. Cook and 
Campbell (1979) further divided experimental validity, adding two other 
types, statistical validity and construct validity. For purposes of this introduc- 
tory treatment of the issue, we will confine our discussion to the two types 
of experimental validity described by Campbell and Stanley. 


Internal validity. An experiment has internal validity to the extent that 
the factors that have been manipulated (independent variables) actually 
have a genuine effect on the observed consequences (dependent variables) 
in the experimental setting. 


External validity. The researcher would achieve little of practical value 
if these observed variable relationships were valid only in the experimental 
setting and only for those participating. External validity is the extent to 
which the variable relationships can be generalized to other settings, other 
treatment variables, other measurement variables, and other populations. 

Experimental validity is an ideal to aspire to, for it is unlikely that it 
can ever be completely achieved. Internal validity is very difficult to achieve 
in the nonlaboratory setting of the behavioral experiment where there are 
so many extraneous variables to attempt to control. When experimental 
controls are tightened to achieve internal validity, the more artificial, less 
realistic situation may prevail, reducing the external validity or generaliz- 
ability of the experiment. Some compromise is inevitable so that a reason- 
able balance may be established between control and generalizability— 
between internal and external validity. 


Threats to internal Experimental Validity 


In educational experiments, or in any behavioral experiments, a number 
of extraneous variables are present in the situation or are generated by the 
experimental design and procedures. These variables influence the results 
of the experiment in ways that are difficult to evaluate. In a sense, they 
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introduce rival hypotheses that could account for experimental change not 
attributable to the experimental yariables under consideration. Although 
these extraneous variables usually cannot be completely eliminated, many 
of them can be identified. It is important that behavioral researchers an- 
ticipate them and take all possible precautions to minimize their influence 
through sound experiment design and execution. 

A number of factors jeopardize the power of the experimenter to 
evaluate the effects of independent variables unambiguously. Campbell 
and Stanley (1966) have discussed these factors in their excellent definitive 
treatment. They include the following: 


Maturation. Subjects change in many ways over a period of time, and 
these changes may be confused with the effect of the independent variables 
under consideration. During the course of a study, the subjects might 
become more tired, wiser, hungrier, older, and so on. They may be influ- 
enced by the incidental learnings or experiences that they encounter through 
normal maturation. This threat is best controlled by randomly assigning 
subjects to experimental and control groups. Differences between the groups 
would then be considered to be due to the treatment rather than to ma- 
turation. 


History. Specific external events occurring between the first and sec- 
ond measurements and beyond the control of the researcher may have a 
stimulating or disturbing effect upon the performance of subjects. The 
effect of a fire drill, the emotional tirade of a teacher, a pep session, the 
anxiety produced by a pending examination, or a catastrophic event in the 
community may significantly affect the test performance of a group of 
students. 

In many experiments, these external events will have a similar effect 
upon both experimental and control subjects, in which case this threat is 
controlled. However, because they are specific events, they may affect one 
group but not the other. The effect of these uncontrolled external events 
is one of the hazards inherent in experiments carried on outside the lab- 
oratory. In laboratory experiments these extraneous variables can be con- 
trolled more effectively. 


Testing. The process of pretesting at the beginning of an experiment 
can produce a change in subjects. Pretesting may produce a practice effect 
that can make subjects more proficient in subsequent test performance. 
Testing presents a threat to internal validity that is common to pretest- 
posttest experiments. Of course, an equivalent control group would be 
affected by the test in a similar way as the experimental group. Thus, having 

»erimental and control groups controls for this threat in the same way 
tnat it does for the threat of maturation. 
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Unstable instrumentation. Unreliable instruments or techniques used to 
describe and measure aspects of behavior are threats to the validity of an 
experiment. It tests used as instruments of observation are not accurate or 
consistent, a serious element of error is introduced. If human observers 
are used io describe behavior changes in subjects, changes in observers or 
in their standards due to fatigue, increased insight or skill, or changes in 
criteria of judgment over a period of time are likely to introduce error. 


Statistical regression. Statistical regression, also known as regression to 
the mean, is a phenomenon that sometimes operates when subjects are se- 
lected on the 5asis of extremely high or extremely low pretest scores and 
when the measurement device is not totally reliable, which is common. 
Subjects who score very high, near the ceiling, on a pretest, will most likely 
score lower (nearer the mean) on a subsequent testing. Subjects who score 
very low, near the floor, on a pretest will most likely score higher (nearer 
the mean) on a subsequent testing. The reader should be aware that this 
phenomenon only occurs when subjects are selected as a group because of 
their extreme scores and that the regression we are referring to is for the 
group as a whole, not all individuals. Posttest scores for individuals may 
go in the opposite direction expected by this phenomenon for the group. 

The purpose of a study may require the researcher to select subjects 
based on their extreme scores. A study of the effects of different remedial 
reading programs assumes that the subjects must need remedial reading 
instruction and, therefore, have very low reading scores on the pretest. To 
control for regression to the mean, the researcher would randomly assign 
his or her sample of poor readers to the experimental and control groups. 
Since both groups would be expected to improve equally because of regres- 
sion to the mean, if the experimental group improved significantly more 
than the control group, the researcher could conclude that this was due to 
the experimental treatment rather than statistical regression. 


Selection bias. Selection bias is represented by the nonequivalence of 
experimental and control groups, and its most effective deterrent is the 
random assignment of subjects to treatments, Selection bias is likely when, 
upon invitation, volunteers are used as members of an experimental group. 
Although they may appear to be equated to the nonvolunteers, their char- 
acteristics of higher motivation may introduce a bias that would invalidate 
reasonable comparison. Selection bias may be introduced when intact classes 
are used as experimental and control groups: Because of scheduling ar- 
rangements, an English class meeting during the fourth period may consist 
of particularly able students who are scheduled at that period because they 
are also enrolled in an advanced mathematics class. 3 


Interaction of selection and maturation. ‘This type of threat to the internal 
validity of a study is not the same as selection bias. The interaction of 
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selection and maturation may occur wherever the subjects can select which 
treatment (e.g., which instructional method) they wiil receive. Even though 
the groups may be equivalent on the pretest and other cognitive measures, 
the reasons why some people choose one treatment over another may be 
related to the outcome measure (dependent variable). Thus, if more mo- 
tivated students chose method A for learning calculus over method B be- 
cause A is harder and requires greater academic motivation, that differ- 
ential motivation might be confused for the effects of the experimental 
variable. 


Experimental mortality. Mortality, or loss of subjects, particularly likely 
in a long-term experiment, introduces a potentially confounding element. 
Even though experimental and control groups are randomly assigned, the 
survivors might represent groups that are quite different from the unbiased 
groups that began the experiment. Those who survive a period of exper- 
imentation are likely to be healthier, more able, or more highly motivated 
than those who are absent frequently or whe drop out of school and do 
not remain for the duration of the experiment. The major concern here 
is whether the groups experienced different loss rates or reasons for drop- 
outs that might confound the results. 

Experimenter bias. This is a type of bias introduced when the re- 
searcher has some previous knowledge about the subjects involved in an 
experiment. This knowledge of subject status may cause the researcher to 
convey some clue that affects the subject’s reaction or may affect the ob- 
jectivity of his or her judgment 

In medical research it is common practice to conceal from the sub- 
ject the knowledge of who is receiving the placebo and who the experi- 
mental medication. This is known as a blind. Having someone other than 
the experimenter administer the treatments and record which subjects are 
receiving the medication and which the placebo provides an additional 
safeguard. This practice, known as a double blind, helps to minimize con- 
tamination. 

Beginners in educational research have been known to contaminate 
a study by classifying student performance when they know the nature of 
the variable to be correlated with that performance. Ina simple ex post facto 
study a student proposed to determine the relationship between academic 
achievement and citizenship grades in her class. Since she proposed to 
assign the citizenship grades herself, it would seem apparent that an ele- 
ment of contamination would result. Her knowledge of the student's pre- 
vious academic achievement would tend to precondition her judgment in 
assigning citizenship grades. ae 

In educational studies of this type, researchers would minimize con- 
tamination if outside observers rated the subjects without any knowledge 


of their academic status. 
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Threats to External Experimental Validity 


Laboratory research has the virtue of permitting the experimenter to care- 
fully avoid threats to internal validity. However, the artificial nature of such 
a setting greatly reduces the generalizability of the findings from such 
research. Since educational researchers are primarily concerned with the 
practical uses of their findings, they frequently conduct their studies in real 
classroom situations. While these real-life settings present opportunities for 
greater generalization, they do not automatically result in externally valid 
research. Campbell and Stanley (1966) also discussed the factors that may 
lead to reduced generalizability of research to other settings, persons, var- 
iables, and measurement instruments. The factors they discussed include 
the following: 


Interference of prior treatment. In some types of experiments the effect 
of one treatment may carry over to subsequent treatments. In an educa- 
tional experiment, learning produced by the first treatment is not com- 
pletely erased and its influence may accrue to the advantage of the second 
treatment. This is one of the major limitations of the single-group, equated- 
materials experimental design in which the same subjects serve as members 
of both control and experimental groups. If an equated-materials design 
is necessary, a counterbalanced design will generally control for this threat. 


The artificiality of the experimental setting. In an effort to control extra- 
neous variables the researcher imposes careful controls which may intro- 
duce a sterile or artificial atmosphere that is not at all like the real-life 
situation about which generalizations are desired. The reactive effect of 
the éxperimental process is a constant threat. 


Interaction effect of testing. The use of a pretest at the beginning of a 
study may sensitize individuals by making them more aware of concealed 
purposes of the researcher and may serve as a stimulus to change. This is 
a different potential problem than that of testing, discussed earlier as a 
threat to internal validity. 

With testing, the threat was that the pretest would affect the subjects’ 
performance on the posttest in a direct fashion. That was easily controlled 
by having a control group. In the case of the interaction effect of testing, 
we have a more difficult problem. Here the pretest may alert the experi- 
mental group to some aspect of the interventions that is not present for 
the control group. That is, the pretest may interact differently with the 
experimental intervention than it does with the control or placebo condi- 
tions. To avoid this threat requires random assignment and either no pre- 
test or the Solomon four-group design discussed in the next section. 


Interaction of selection and treatment. Researchers are rarely, if ever, 


Experimental and Quasi-Experimental Research 123 


able to randomly select samples from the wide population of interest or 
randomly assign to groups; consequently, generalization from samples to 
populations is hazardous. Samples used in most classroom experiments are 
usually composed of intact groups, not randomly selected individuals. They 
are based upon an accepted invitation to participate. Some school officials 
agree to participate; others refuse. One cannot assume that samples taken 
from cooperating schools are necessarily representative of the target pop- 
ulation. Such schools are usually characterized by faculties that have high 
morale, less insecurity, greater willingness to try a new approach, and a 
greater desire to improve their performance. 


The extent of treatment verification. Due to the potential threat of ex- 
perimenter bias, most researchers have research assistants, or others who 
are not directly involved in the formulation of the research hypotheses, 
deliver the treatment. This leads to a potential threat to external validity. 
Was the treatment administered as intended and described by the re- 
searcher? The researcher must have a verification procedure (e.g., direct 
observation, videotape) to make sure that the treatment was properly ad- 
ministered. 

After reading about these threats to experimental validity, the begin- 
ner is probably ready to conclude that behavioral research is too hazardous 
to attempt. Particularly outside of the laboratory, ideal experimental con- 
ditions and controls are never likely to prevail. However, an understanding 
of these threats is important so that the researcher can make every effort 
to remove or minimize their influence. If one were to wait for a research 
setting free from all threats, no research would ever be carried on. Knowing 
the limitations and doing the best that he or she can under the circum- 
stances, the researcher may conduct experiments, reach valid conclusions, 
provide answers to important questions, and solve significant problems. 


EXPERIMENTAL DESIGN 


Experimental design is the blueprint of the procedures that enable the 
researcher to test hypotheses by reaching valid conclusions about relation- 
ships between independent and dependent variables. Selection of a par- 
ticular design is based upon the purposes of the experiment, the type of 
variables to be manipulated, and the conditions or limiting factors under 
which it is conducted. The design deals with such practical problems as 
how subjects are to be assigned to experimental and control groups, the 
way variables are to be manipulated and controlled, the way extraneous 
variables are to be controlled, how observations are to be made, and the 
type of statistical analysis to be employed in interpreting data relationships. 

The adequacy of experimental designs is judged by the degree to 
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which they eliminate or minimize threats to experimental validity. Three 
categories are presented here: 


l. Pre-experimental design —the least effective, for it provides either no 
control group or no way of equating the groups that are used. 

2. True experimental design —employs randomization to provide for con- 
trol of the equivalence of groups and exposure to treatment. 

3. Quasi-experimental design — provides a less satisfactory degree of con- 
trol, used only when randomization is not feasible. 


A complete discussion of experimental design would be too lengthy 
and complex for this introductory treatment. Therefore, only a relatively 
few designs will be described. Readers may wish to refer to Campbell and 
Stanley's (1966) and Cook and Campbell's (1979) excellent treatments of 
the subject, in which many more designs are described. 

In discussing experimental designs, we have followed Campbell and 
Stanley's symbol system. 


R random assignment of subjects to groups or treatments 

X exposure of a group to an experimental (treatment) variable 
C exposure of a group to the control or placebo condition 

O observation or test administered 


Pre-Experimental Designs 


The least adequate of designs is characterized by: (1) the lack of a control 
group, or (2) a failure to provide for the equivalence of a control group. 
\ 


The one-shot case study 


x O 


Carefully studied results of a treatment are compared with a general 
expectation of what would have happened if the treatment had not been 
applied. This design provides the weakest basis for generalization. 

Mr. Jones used a 25-minute film on racial integration in his junior 
high school history class. In a test administered after the showing of the 
film, the mean score was 86 (a high score indicated a favorable attitude 
toward acceptance of all racial groups). Mr. Jones believes that the mean 
score was higher than it would have been had the film not been viewed 
and, as he recalls, higher than the mean score of a test that he had ad- 
ministered to a similar class several years before. He concludes that the 
film has been effective in reducing racial prejudice. 

However, Mr. Jones has come to this conclusion on the basis of in- 
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adequate data. The reader has no way of knowing if a change has occurred 
due to the lack of a pretest, or if a similar group who had not seen the 
film (a control group) would have scored differently than the group viewing 
the film. This design is the poorest available and should not be used. 


The one-group, pretest-posttest design 


O, X © 
O, = pretest Oy = posttest 


This design provides some improvement over the first, for the effects 
of the treatment are judged by the difference between the pretest and the 
posttest scores. No comparison with a control group is provided. 

In the same setting, Mr. Jones administered a pretest before showing 
the film and a posttest after the viewing. He computed the mean difference 
between the pretest and the posttest scores and found that the mean had 
increased from 52 to 80, a mean gain of 28 score points. He also apparently 
detected some temporary improvement in attitude toward racial integra- 
tion, He concludes that there has been a significant improvement in attitude 
as a result of viewing the film. But what about the sensitizing effect of the 
pretest items that may have made the students aware of issues that they 
had not even thought of before? What would the gain have been if the 
pretest and the posttest had been administered to another class that had 
not viewed the film? Threats to the internal validity that are not controlled 
include history, maturation, testing, and so forth. External validity is also 


poor. 


The static-group comparison design 


x O 
co 


This design compares the status of a group that has received an 
experimental treatment with one that has not. There is no provision for 
establishing the equivalence of the experimental and control groups, a very 
serious limitation. 

A beginning researcher administered the 25-minute racial integration 
film to a group of elementary teachers in one school. He then administered 
the attitude scale and computed the mean score. At another elementary 
school he administered the attitude scale to teachers who had not viewed 
the film. A comparison of mean scores shows that the teachers who had 
viewed the film had a higher mean score than those who had not. He 
concluded that the film was an effective device in reducing racial prejudice. 

What evidence did he have that the initial attitudes of the groups 
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were equivalent? Without some evidence of equivalence of the control and 
experimental groups, attributing the difference to the experimental vari- 
able is unwarranted. 


True Experimental Designs 


In a true experiment the equivalence of the experimental and control 
groups is provided by random assignment of subjects to experimental and 
control treatments. Although it is difficult to arrange a true experimental 
design, particularly in school classroom research, it is the strongest type of 
design and should be used whenever possible. Three experimental designs 
are discussed in the following sections. 


The posttest-only, equivalent-groups design 


RX 
Fi 2G 03 


This design is one of the most effective in minimizing the threats to 
experimental validity. It differs from the static group comparison design 
in that experimental and control groups are equated by random assign- 
ment. At the conclusion of the experimental period the difference between 
the mean test scores of the experimental and control groups are subjected 
to a test of statistical significance, a ¢ test, or an analysis of variance. The 
assumption is that the means of randomly assigned experimental and con- 
trol groups from the same population will differ only to the extent that 
random sample means from the same population will differ as a result of 
sampling error. If the difference between the means is too great to attribute 
to sampling error, the difference may be attributed to the treatment variable 
effect. 

Using a table of random numbers, the researcher selects 80 students 
from a school population of 450 sophomores. The 80 students are randomly 
assigned to experimental and control treatments, using 40 as the experi- 
mental group and 40 as the control group. The experimental group is 
taught the concepts of congruence of triangles by an experimental pro- 
cedure method X, and the control group is taught the same set of concepts 
by the usual method, method C. All factors of time of day, treatment length 
in time, and other factors are equated. At the end of a 3-week period the 
experimental and control groups are administered a test, and the difference 
between mean scores is subjected to a test of statistical significance. The 
difference between mean scores is found to favor the experimental group, 
but not by an amount that is statistically significant. The researcher rightly 
concludes that the superiority of the X group could well have been the 
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result of sampling error and that there was no evidence of the superiority 
of the X method. 


The pretest-posttest equivalent-groups design 


RO X07 X gain = O; — O; O, O, = pretests 
R.,04. C O; C gain = O, - O, O; O, = posttests 


This design is similar to the previously described design, except that 
pretests are administered before the application of the experimental and 
control treatments and posttests at the end of the treatment period. Gain 
scores may be compared and subjected to a test of the significance of the 
difference between means. Pretest scores can also be used in analysis of 
covariance to statistically control for any differences between the groups 
at the beginning of the study. This is a strong design, but there may be a 
possibility of the influence of the effect of testing and the interaction with 
the experimental variable. 

Watanabe, Hare, and Lomax (1984) have reported on a study that 
included a pretest-posttest equivalent-groups design. This study compared 
a procedure for teaching eighth-grade students to be better able to predict 
the content of newspaper stories from their headlines than a control group 
of eighth-grade students. A pilot study, reported in their article, indicated 
that even good middle-school readers have difficulty predicting the content 
of news stories from the headlines, but that college students have no trouble 
with this task. Because the eighth-graders they surveyed reported reading 
primarily comics, movie, and sport sections (which might explain their poor 
prediction of content from headlines) and because most teachers would 
prefer that their students read more of the newspaper, the authors felt 
that it would be useful to determine if a training program could teach 
eighth-graders how to better understand headlines. 

Watanabe et al. randomly assigned 46 eighth-graders to either head- 
line reading instruction (experimental group) or regular reading instruc- 
tion (control group). All 46 students were asked to read 20 headlines and 
predict story content prior to, and after, a 3-week period of instruction. 
The authors scored each attempt to predict story content on a scale of 0 
to 4, with 0 indicating that the student's response explained nothing and 
4 indicating an "on-target potential prediction" (pp. 439—440). Thus each 
student could receive a score from 0 to 80 on each of the testings. 

At the end of the 3 weeks of instruction, the authors compared the 
two groups using analysis of covariance (ANCOVA) and found that the 
experimental group was better able to predict story content from headlines 
after training than the control group. ANCOVA was used because even 
with random assignment the groups were not exactly equal. ANCOVA 
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permitted the authors to statistically control for differences on the pretest 
so that posttest differences would not be due to initial differences prior to 
training. 


The Solomon four-group design 


R OX © 
R 0O, C 0, 
R X. O; 
R C Os 


In this design: 


Subjects are randomly assigned to four groups. 

| Two groups receive the experimental treatment (X). 
One experimental group receives a pretest (0,). 
Two groups (control) do not receive treatment (C). 
One control group receives a pretest (0,). 
All four groups receive posttests (Os O, O; Ox). 


SE wg mar 


The design is really a combination of the two group designs previously 
described, the posttest only and the pretest-posttest. It is possible to evaluate 
the effects of testing, history, and maturation. Analysis of variance is used 
to compare the four posttest scores, analysis of covariance to compare gains 
in Oy and O,. 

Because this design provides for two simultaneous experiments, the 
advantages of a replication are incorporated. A major difficulty is finding 
enough subjects to assign randomly to four equivalent groups. 


Quasi-Experimental Designs 


These designs provide control of when and to whom the measurement is 
applied, but because random assignment to experimental and control treatments 
has not been applied, the equivalence of the groups is not assured. Of the 
many quasi-experimental designs, only five are described. See Cook and 
Campbell (1979) for a comprehensive review of quasi-experimental designs. 


The pretest-posttest nonequivalent-groups design 


OQ, X. Oy O, O, = pretests 
05:4 Ci 0. O; O, = posttests 


This design is often used in classroom experiments when experimen- 
tal and control groups are such naturally assembled groups as intact classes, 


Experimental and Quasi-Experimental Research 129 


which may be similar. The difference between the mean of the O, and 0, 
scores and the difference between the mean of the Os and O, scores (mean 
gain scores) are tested for statistical significance. Analysis of covariance may 
also be used. Because this design may be the only feasible one, the com- 
parison is justifiable, but the results should be interpreted cautiously. 

Two first-grade classes in a school were selected for an experiment. 
One group was taught by the initial teaching alphabet (ITA) approach to 
reading, and the other was taught by the traditional alphabet approach. 
Prior to the introduction of the two reading methods and again at the end 
of the school year, both groups were administered a standardized reading 
test, and the mean scores of the two groups were compared. The ITA 
group showed a significant superiority in test scores over the conventional 
alphabet group. However, without some evidence of the equivalence of the 
groups in intelligence, maturity, readiness, and other factors at the begin- 
ning of the experimental period, conclusions should be cautiously inter- 
preted. 


The Follow Through Planned Variation Study. An interesting example of. 
the pretest-posttest nonequivalent groups design was the Follow Through 
Planned Variation Study (Abt Associates, 1977), conceived in the late 1960s 
and initiated and funded by the United States Office of Education. The 
purpose of the program was to implement and evaluate a variety of com- 
pensatory programs, extending the services of Project Head Start for dis- 
advantaged chiidren into the primary grades. Head Start was a large-scale 
enterprise, including many innovative instructional models and involving 
the expenditure of more than a half billion dollars. The program ex- 
tended over a period of more than 9 years, with more than 79,000 first-, 
second-, and third-grade children participating. Of the twenty different 
instructional models and 170 projects, 17 models and 70 projects were 
selected for evaluation. Approximately 2 percent of the total number of 
children were included in the evaluation. 

Participation by school districts was voluntary, with each district se- 
lecting the particular model that it wished to implement and helping to 
choose the groups that were to be used as controls. Treatments were not 
randomly assigned nor control groups randomly selected. 

The unit of analysis was pupil mean gain for groups K-3 and 1-3 
growth scores, statistically compared by instructional model and by project, 
using variants of linear regression and analysis of covariance. Outcome 
measures were derived from gain scores on the following measuring in- 
struments: 


l. The Metropolitan Achievement Test Battery covering such basic skills as 
reading comprehension, spelling, word usage and analysis, and math- 
ematical computation, concepts, and problem solving. 
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2. The Raven’s Coloured Matrices Test, a nonverbal test of problem-solving 
ability, requiring the manipulation of geometric patterns, essentially 
a measure of intelligence rather than a measure of learning outcomes. 

3. The Coopersmith Self-Image Inventory, a measure of self-esteem, but ques- 
tioned on the grounds that it required a maturity of judgment beyond 
the competence of primary age children. 

4. The Intellectual Achievement Responsibility Scale which attempts to assess 
the child’s experience of success or failure, indicating the degree to 
which the child attributes success to internal or external causes. This 
instrument was also judged to require insights beyond the maturity 
level of small children. 


There have been many analyses and evaluations of the program by 
official and independent agencies funded by the United States Office of 
Education and by private philanthropic foundations. Among the evaluating 
agencies were the Office of Education; Abt Associates, Inc.; The Stanford 
Research Institute; The Huron Institute; and The Center for Research 
and Curriculum Evaluation of the University of Illinois. 

It is unlikely that any large-scale study has been scrutinized so exten- 
sively concerning research design, procedures employed, and interpreta- 
tion of the data. There have been critiques of the evaluations and critiques 
of the critiques, with sharp disagreement on most aspects of the study 
(Anderson, St. Pierre, Proper, & Stebbins, 1978; House, Glass, McLean, & 
Walker, 1978; Wisler, Burns, & Iwamoto, 1978). 

However, the consensus is that the findings were disappointing, be- 
cause most of the experimental effects were negligible. Only a few of the 
treatment effects produced as much as a one-quarter standard deviation 
change. (This concept is discussed in Chapter 8.) Of those that met this 
criterion, two instructional models with at least one positive effect were 
structured approaches. Three models with at least one negative effect were 
unstructured approaches. Few of either cognitive, structured approaches 
or child-centered, nonstructured approaches yielded significant effects. 

Much of the disagreement centered around the reasons why the study 
was ineffective. Several explanations have been suggested. 


1, The research was deficient in design, implementation, statistical anal- 
ysis, and interpretation. Because experimental treatments were not 
randomly selected and control groups were not randomly assigned, 
mismatching resulted and comparisons were really made between 
different populations. 

2. There was great intersite difference in effectiveness within a given 
instructional model. Most of the within-model differences were greater 
than the between-models difference. There may have been serious 
deficiencies in the competence of those who implemented the inno- 
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vative procedures or in the actual method implemented, even though 
the teachers and their teacher aides were specially trained and their 
activities monitored by the project sponsors. 

3. The measuring instruments may have been incompatible with the 
goals of the project because of inadequate identification and definition 
of appropriate outcome variables. The more effective instruments 
seemed to focus on basic skills or traditional educational goals rather 
than on goals ordinarily associated with nonstructured approaches to 
education, Some measured intellectual status rather than achievable 
learning goals. Others appeared to require a maturity of response too 
complex for primary-age children. ` 


Not all reactions to the study were negative. Hodges, a member of 
the Follow Through Task Force, lists a number of reasons for viewing the 
program as significant and worthwhile. “Just because Follow Through has 
not proved to be an easy, workable, inexpensive solution for all the edu- 
cational problems of poor children does not mean it should be dismissed 
as just another failure in compensatory education" (Hodges, 1978, p. 191). 

In behavioral research, the random selection and assignment of sub- 
jects to experimental and control groups may be impracticable. Because of 
administrative difficulties in arranging school experiments, it may be nec- 
essary to use the same group as both the experimental and control group. 
These designs have two apparently attractive features. They can be carried 
on with one intact group without a noticeable reorganization of the class- 
room schedule. The changes in procedures and testing can be concealed 
within ordinary classroom routines. Artificiality can be minimized, for the 
procedures can be introduced without the subjects' awareness of partici- 
pation in an experiment. 


The time-series design. At periodic intervals, observations (measure- 
ments) are applied to individuals or a group. An experimental variable (X) 
is introduced, and its effect may be judged by the change or gain from the 
measurement immediately before to the one immediately after its intro- 
duction. The purpose of the series of measurements before and after the 
intervention or treatment is to demonstrate little or no change except im- 
mediately after the intervention. 

In the time-series experimental design, a measured change or gain 
from observation 4 to observation 5 would indicate that the treatment had 
an effect. This design is particularly sensitive to the failure to control the 
extraneous variable, history, for it is possible that some distracting, simul- 
taneous event at the time of the intervention would provide a rival hy- 
pothesis for the change. 


O, O2 O; O, X Os Og O; Og 
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The diagram showing one X and several Os does not necessarily rep- 
resent the relative number of sessions for each. It may be that each O 
represents one measurement while the single X represents an intervention 
of several weeks. While it is better to have several observations, as shown, 
it is not always possible to have this many. For instance, a recent time-series 
experiment by a student only used two preintervention and two postinter- 


. vention measures. Since this study was measuring the effect of a program 


to reduce the number of criminal victimizations of disabled students, it was 
necessary to have a 2-month period between measurements in order to 
have a sufficient number of victimizations for each period measured. That 
is, O,, in November measured September and October crimes, Ogin January 
measured crimes in November and December, and so on. 


The equivalent time-samples design. Instead of having equivalent sam- 
ples of persons, it may be necessary to use one group as the experimental 
and control group. In this design, the experimental condition (X,) is present 
between some observations and not (Xo) between others. This may be dia- 
grammed as shown below, although the number of observations and in- 
terventions vary and the alternation of the experimental condition with the 
control condition would normally-be random rather than systematic as 
shown here. i 


O, X, Oz Xo O3 X, O4 Xo Os 


A study by Hall et al. (1973) illustrates a version of the equivalent 
time-samples design. Five subjects, identified as the most violently aggres- 
sive, were selected from a group of 46 mentally retarded boys living in an 
institution dormitory. Their ages ranged from 12 to 16 (mean, 13.8); their 
IQs from 40 to 62 (mean — 50). Each subject was observed for 10 weeks 
in 10 randomly selected 3-minute periods, during which time acts of ag- 
gressive behavior were recorded. Acts were classified as motor aggressive 
(throwing objects, kicking, fighting, scratching) and nonmotor aggressive 
(verbal abuse, screaming or shouting, insubordination). 

The observations were scheduled in four periods: 


l. Observation (baseline) session 1 

2. On-reinforcement sessions, 2, 3, 4, 5 
3. Off-reinforcement sessions 6, 7 

4. On-reinforcement sessions 8, 9, 10 


[ 


Positive reinforcement as a reward for nonaggressive behavior con- 
sisted of candy, praise, or trips to the canteen. Negative reinforcement 
following aggressive acts consisted of ostracizing from group activities, tak- 
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FIGURE 5-1 Number of motor aggressive, nonmotor aggressive, and total aggressive acts during on- 
reinforcement and off-reinforcement experimental conditions. 


ing away a favorite toy, or reprimanding verbally. Two observers were 
employed, one observing motor aggressive acts, the other, nonmotor ag- 
gressive acts. 

The researchers concluded that reinforcement affected the amount 
of aggressive output. Motor aggressive behavior was reduced more effec- 
tively than nonmotor aggressive behavior (see Figure 5—1). To assess the 
permanence of behavior change after the conclusion of the experiment, a 
phase-out period of 89 days of observation was scheduled. The only re- 
inforcement used was the posting of stars for nonaggressive behavior. Ob- 
servations during the phase-out period indicated much more acceptable 
dormitory behavior. 

Designs of this type have a number of limitations. Although they may 
minimize the effect of history, it is possible that they may increase the 
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influence of maturation, unstable instrumentation, testing, and experi- 
mental mortality. 


The equivalent materials, pretest, posttest design 


0, Xua Oz Os Xue 0, 


Xma = teaching method A Xyg = teaching method B 
O, and O, are pretests Ox and O, are posttests 


Another experimental design, using the same group or class for both 
experimental and control groups, involves two or more cycles. The class 
may be used as a control group in the first cycle and as an experimental 
group in the second. The order of exposure to experimental and control 
can be reversed — experimental first and control following. 

Essential to this design is the selection of learning materials that are 
different but as nearly equal as possible in interest to the students and in 
difficulty of comprehension. An example may help to clarify the procedure. 

Ms. Smith hypothesized that the students in her class who were used 
to background music while doing their homework would learn to spell 
more efficiently in the classroom if music were provided. Because she was 
unable to arrange a parallel group experiment, she decided to use her class 
as both an experimental and a control group. 

To equate the words to be learned, she randomly selected two sets of 
100 words from an appropriate graded word list. For cycle I, the control 
cycle, she pretested the class on word list A. Then for 20 minutes each day 
the students studied the words, using drill and the usual spelling rules. At 
the end of 2 weeks she retested the class and computed the mean gain 
score in correct spelling. 

For cycle II, the experimental cycle, she pretested the class on word 
list B. Then for 20 minutes each day, with soft, continuous music in the 
background (the experimental condition), the students studied their word 
list, using the same drill and spelling rules. At the end of the second 2- 
week period she retested the class and computed the mean gain score in 
correct spelling. 

The mean gain score for the experimental cycle was significantly 
greater than the mean gain score for the control cycle. She concluded that 
the introduction of the experimental variable had indeed improved the 
effectiveness of the learning experience. 

The apparent simplicity and logic of this design is somewhat mis- 
leading, and when examined in light of the threats of experimental validity, 
the design's weaknesses become apparent. 


l. It is often difficult to select equated materials to be learned. For types 
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of learning other than spelling, finding learning materials that are 
equally interesting, difficult, and unfamiliar would be a serious prob- 
lem. 

2. Asthe students enter the second cycle, they are older and more ma- 
ture. They also have more experience. 

3. Outside events (history) would be more likely to affect the experience 
in one cycle than in the other. 

4. There would be an influence of prior treatment carrying over from 
the first cycle to the second. 

5. The effects of testing would be more likely to have a greater impact 
on the measurement of gain in the second cycle. 

6. Mortality, or loss of subjects from the experiment, would be more 
likely in an experimental design spread over a longer period of time. 

7. When the experimenter’s judgment was a factor in evaluation, con- 
tamination, the experimenter's knowledge of subject performance in 
the first cycle, could possibly influence evaluation of performance in 
the second. 


Some of the limitations of the equivalent-materials, single-group, pre- 
test-posttest design can be partially minimized by a series of replications in 
which the order of exposure to experimental and control treatments is 
reversed. This process, known as rotation, is illustrated by this pattern in a 
four-cycle experiment. 


l li ul IV 


O; X O; O, C O, 0, C O, QO, X Os 
O, O, O; O; = pretests O2 O, Os O, = posttests 


If the experimental treatment yielded significantly greater gains re- 
gardless of the order of exposure, its effectiveness could be accepted with 
greater confidence. However, it is apparent that this design is not likely to 
equate materials, subjects, or experimental conditions. 

All single-group experimental designs are sensitive to the influences 
of many of the threats to validity previously mentioned in this chapter: 
history, maturation, unstable instrumentation, testing, and experimental 
mortality. Replication of the studies, using different units as subjects, is an 
effective way to improve their validity. However, single-group experiments 
may be performed when randomly equated group designs cannot be ar- 
ranged. 


Counterbalanced designs. "These are designs in which experimental 
control derives from having all the subjects receive all the treatment con- 
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ditions. The subjects are placed into, in. the case of this example, four 
groups. Each of the groups then receives all four treatments, but in dif- 
ferent orders. This may be diagrammed ås follows: 


Replication OX, OX, OX; 0, X, O; 
1 Group A B Cc D 
2 Group B D A 2 C 
3 Group C A D. r B 
4c Group D [9] B A 


In the first sequence following a pretest (0;); group A receives treat- 

ment 1, group B receives treatment 2, groüp C receives treatment 3, and 

i group D receives treatment 4. After a second test (05), each group then 

EE receives a second treatment, and so on. Thus each group receives all treat- 

ments, and each treatment is first, second, third, or fourth in the order 
received by one of the groups. 

This design has excellent internal validity because history, maturation, 
regression, selection, and mortality are all generally well controlled. The 
major limitation is that an order effect could wipe out any potential dif- 
ferences among the treatments, Four randomly assigned groups would 
therefore be preferable. Thus, this design should: be used when random 
assignment is not possible and when it is expected that the different treat- 
ments will not interfere too much with each other. 


Factoriai Designs 


When more than one independent variable is included in a study, whether 
a true experiment or a quasi-experiment, a factorial design is necessary. 
Because most real-world outcomes are the result of a number of factors 
acting in combination, most significant experimentation involves the anal- 
ysis of the interaction of a number of variable relationships. By using 
factorial designs, researchers can determine, for example, if the treatment 
interacts signi icantly with sex or age. That is, the experimenter can de- 
termine if one treatment is more effective with boys and another with girls, 
or if older girls do better on the treatment than younger girls, whereas 
older and younger boys do equally well on the treatment. 
The simplest case of a factorial design would be to have two inde- 
` pendent variables with two conditions of each, known asa 2X 2 factorial 
design. This design would be used if a researcher decided to compare a 
new (experimental) method of teaching reading to reading-disabled chil- 
dren with a commonly used (control) method, and also wanted to determine 
if boys and girls would do differently on the two methods. Such a design 
would look like Figure 5-2. HENCE 
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Experimental Control 
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FIGURE 5-2 Factorial design. 


With this design we have four cells, each of which represents a sub- 
group (for example, experimental females, control males, and so forth). 
This design will permit the researcher to determine if there is a significant 
overall effect, known as main effect, for treatment and/or sex. It also permits 
the determination whether these two variables interact significantly, such 
that boys do best in the experimental condition and girls do best in the 
control condition. If this were the case, the subjects in Cell 2 would have 
a higher average score than those in Cell 1, and the subjects in Cell 3 would 
outperform those in Cell 4. t i 

Nucci and Nucci (1982) examined the responses of children to the 
social transgressions (such as spitting on the ground) of their peers. They 
observed boys and girls between 7 and 10 and between 11 and 14 years of 
age and coded their observations of the responses into one of eight cate- 
gories. They found an interaction effect of sex by age for just one of the 
categories. This interaction effect could be’ graphically represented as in 
Figure 5—3. We see that the two lines actually cross, thus clearly indicating 
that “With increased age the girls provided greater frequencies of ridicule 
responses to [social transgressions] while the/boys responded with approx- 
imately the same frequencies as at the younger age” (Nucci & Nucci, 1982, 
p:1341). Figure 5—4 shows an example of another type of response, stating 
the rule being violated, for which Nucci and Nucci found no interaction 
effect. Here we see two relatively parallel lines. 
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Proportion Ridicule Responses 


FIGURE 5-3 Interaction effect (based on data from Nucci & Nucci, 1982). 


j bL 

Of course; factorial designs: can have more than two independent 
variables and more than two conditions of each variable. A study might 
have three treatment conditions (e.g., three methods of reading instruc- 
tion), the two sexes, three age groups, and three intelligence levels (gifted, 
average, and mildly retarded) as the independent variables. This would be 
a3 x 2:3 x. 3 design and;would have a total of 54 subgroups or cells. 
ri Such designs are toó complex for this elementary treatment. We mention 
such a complex design only to make the reader aware that these designs 
exist and that they are frequently appropriate and necessary. Advanced 
students may wish to refer to such sources as Glass and Hopkins (1984), 

Kirk (1982), and Winer (1971) for more detailed information. 
This discussion, which has examined the-many limitations of the ex- 
perimental method in behavioral research, may convey a sense of futility. 
Asis true in many other;areas.of significant humaniendeavor, researchers 
do not work under ideal conditions. They must do the best they can under 
existing circumstances. They will find, however, that in spite of its limita- 
^r». tions; the well-designed and well-executed experiment providesa legitimate 
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FIGURE 5-4 No interaction effect (based on data from Nucci & Nucci, 1982). 


method for testing hypotheses and making probability decisions about the 
relationships between variables. 

Some variables cannot be manipulated. The ethical problems that 
would be raised if some others were manipulated indicates a place for such 
nonexperimental methods as ex post facto research. The researcher starts 
with the observation of dependent variables and goes back to the obser- 
vation of independent variables that have previously occurred under un- 
controlled conditions. Such studies are not experiments, for the researcher 
has had no control over the events; they occurred before he or she began 
the investigation. The description of cigarette-smoking cancer research in 
Chapter 4 is an example of ex post facto research. 


The experimental method provides a logical, systematic way to answer the 
question, “If this is done under carefully controlled conditions, what will 
happen?" To provide a precise answer, experimenters manipulate certain 
influences, or variables, and observe how the condition or behavior of the 


140 Experimental and Quasi-Experimental Research 


EXERCISES 


subject is affected or changed. Experimenters control or isolate the variables 
in such a way that they can be reasonably sure that the effects they observe 
can be attributed to the variables they have manipulated, rather than to 
some other uncontrolled influences. In testing hypotheses or evaluating 
tentative answers to questions, experimenters make decisions based upon 
probability rather than certainty. Experimentation, the classic method of 
the laboratory, is the most powerful method for discovering and developing 
a body of knowledge about the prediction and control of events. The 
experimental method has been used with some success in the school class- 
room, where, to some degree, variables can be controlled. 

The early applications of experimental method, based upon John 
Stuart Mill's law of the single variable, have been replaced by the more 
effective applications of factorial designs made possible by the contributions 
of R. A. Fisher. His concept of equating groups by random selection of 
subjects and random assignment of treatments, and his development of 
the analysis of variance and the analysis of covariance, have made possible 
the study of complex multivariate relationships that are basic to the under- 
standing of human behavior. 

Experimenters must understand and deal with threats to the internal 
validity of the experiment so that the variable relationships they observe 
can be interpreted without ambiguity. They must also understand and deal 
with threats to the external validity of the experiment so that their findings 
can be extended beyond their experimental subjects and generalized to a 
wider population of interest. 

Experimental design provides a plan or blueprint for experimenta- 
tion. Three pre-experimental, three true experimental, and five quasi- 
experimental designs have been presented, and their appropriate use, ad- 
vantages, and disadvantages have been briefly discussed. 

Experimentation is a sophisticated technique for problem solving and 
may not be an appropriate activity for the beginning researcher. It has 
been suggested that teachers may make their most effective contribution 
to educational research by identifying important problems that they en- 
counter in their classrooms and working cooperatively with research spe- 
cialists in the conduct and interpretation of classroom experiments. 


1. Whyisitmore difficult to control extraneous variables in a classroom experiment 
than in a pharmaceutical laboratory experiment? 

2. What significant element distinguishes a quasi-experiment from a true exper- 
iment? 

3. Why is an ex post facto study not an experiment? 
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4. Aresearcher, in proposing a research project, defines the dependent variable 
as achievement in mathematics. What difficulty does this definition present? 
How would you improve it? 

5. How could a double blind be applied in an educational experiment? 

6. Under what circumstances could an independent variable in one study be a 
dependent variable in another study? 

7. Why is randomization the best method for dealing with extraneous variables? 

8. How could a high degree of experimental mortality seriously affect the validity 
of an experiment? 

9. em the report of an experiment in an educational research journal. 

Was the problem clearly stated? 

Were the variables defined in operational terms? 

Was the hypothesis clearly stated? 

Were the delimitations stated? 

Was the design clearly described? 

Were extraneous variables recognized? What provisions were made to 

control them? 

Were the population and the sampling methods described? 

Were appropriate methods used to analyze the data? 

Were the conclusions clearly presented? 

Were the conclusions substantiated by the evidence presented? 
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SINGLE-SUBJECT 
EXPERIMENTAL 
RESEARCH 


The research designs just described in Chapter 5 all have one common 
characteristic. They all are used to study group behavior and change. Sin- 
gle-subject research, also sometimes referred to as single-case or N of one 
research, is a particular type of experimental research. Its distinguishing 
feature is the rigorous study of the effect of interventions on an individual. 
While the focus of this type of study is the individual subject, most of these 
studies include more than one subject. When there are multiple subjects, 
the data still are analyzed separately for each subject rather than as a group 
as would be done in the designs described in Chapter 5. 

While there are many fine books on the topic of single-subject re- 
search, two which the authors find particularly useful are Barlow and 
Hersen (1984) and Kazdin (1982). The structure of this chapter and, where 
indicated, the content, were influenced by these two superb texts. We rec- 
ommend these texts to anyone wishing an in-depth coverage of single- 
subject research. 

As with experimental research in general, single-subject research is a 
method of testing hypotheses. It also is prone to many of the same threats 
to internal and external validity to which other research designs are subject. 
In particular, many critics of single-subject research question its external 
validity —in particular, its ability to generalize to other subjects. Proponents 
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point out that this is a problem of most research. They question whether 
group analyses are useful for determining an individual's treatment. They 
argue that just because the experimental group outgained the control group 
on the average does not mean that every person receiving the experimental 
treatment outgained every person in the control group—or even that everyone 
in the experimental group improved. 

The decision to use a single-subject research design depends, as does 
the selection of any research design, on the purpose of the study, the 
population of interest, and the situation in which the study is to be con- 
ducted. Single-subject research designs are particularly useful in the study 
of behavior modification. Most, if not all behavior modification research 
uses single-subject designs. In fact, this type of research-and the method- 
ology are so often used together that many people confuse the two. Be- 
havior modification research studies the effect of a certain type of inter- 
vention, operant conditioning, on individuals. Single-subject research is a 
methodology that can be applied to a variety of research topics. 

The case study method described in Chapter 4 is the clinical, descrip- 
tive foundation from which the experimental study of single-subjects de- 
veloped. "The development of single-case research, as currently practiced, 
can be traced to the work of B.F. Skinner (b. 1904), who developed pro- 
grammatic animal laboratory research to elaborate operant conditioning" 
(Kazdin, 1982, p. 10).' Skinner's (1938, 1953) research methodology, known 
as the experimental analysis of behavior, included certain features that are 
characteristic of single-subject research today. He included only one or a 
few subjects in each of his studies. He used the subject as its own control 
by changing the intervention presented to the subject and studied the 
impact of the changes on the subject. Skinner was also very interested in 
the frequency with which a behavior occurred under various conditions 
(Kazdin, 1982). : 

Beginning in the 1950s, a number of investigators adapted Skinner's 
operant approach and methodology of the experimental analysis of be- 
havior to humans. The early laboratory research produced findings that 
indicated the clinical utility of operant conditioning with a variety of pop- 
ulations (e.g., autistic children, mentally retarded persons, psychiatric pa- 
tients). Thus, was born the field of applied behavior analysis with its own 
journal, the Journal of Applied Behavior Analysis, first published in 1968. Most 
of the research published in this journal uses single-subject research meth- 
ods. In the last two decades, an increasing number of studies using this 
methodology for operant conditioning and other research topics has ap- 
peared in a variety of journals. Single-subject designs are similar to three 
of the quasi-experimental designs described in Chapter 5, the time-series 


'Ali quotes from Kazdin (1982) used with permission of Oxford University Press. 
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design, the equivalent samples design, and the equivalent materials pretest- 
posttest design. Each of these designs includes some change in the con- 
ditions applied to the subjects with repeated observations or measurements. 
The major difference between these designs and single-subject research 
designs is that these quasi-experimental designs are used with a group of 
subjects and the data are analyzed accordingly while single-subject research 
is concerned with individuals. o 

Single-subject research requires careful assessment, repeated obser- 
vations or measurements, and careful control and applications of the ex- 
perimental treatment. This chapter will address these issues and describe 
the most common designs. 


GENERAL PROCEDURES 


Repeated Measurement 


One aspect of single-subject research is the repeated measurement or ob- 
servation of the individual. The purpose is obvious: to determine if changes 
in the experimental conditions effect changes in the subject. The careful, 
systematic use of these repeated observations is critical in order to assure 
reliable and valid data. 

The measurement to be used must be clearly defined. If, as is com- 
mon, the procedure is observation, the behaviors to be observed must be 
carefully defined and observable. The researcher must also be careful in 
selecting the behaviors to be observed. In particular, the behaviors must 
be ones that the subject would normally be expected to exhibit with a 
reasonable degree of frequency. 

If the measurement procedure includes tests, surveys, or attitude 
scales, the researcher must select instruments that can be used repeatedly 
without the contamination of test or test-interaction effects. Since elimi- 
nation of the test and test-interaction effects is often impossible, observation 
is the primary measurement tool in single-subject research studies. 

The measurements also must be used under completely standardized 
conditions. The researcher needs to use the same measurements, or ob- 
servation procedures, for each replication of the measurement. Where 
possible, the same observers or test givers should be used for all measure- 
ments. When this is not possible, the researcher should demonstrate reli- 
ability of the measurements across the various personnel used. The meas- 
urements should take place under the same conditions each time they are 
conducted. Conditions that should be standardized across measurements 
include the time of day, the circumstances (e.g., during a certain lesson 
such as spelling), and the general surroundings (e.g., location, others pres- 
ent) in which the measurements take place. 
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The baseline in single-subject research is analogous to a pretest in group 
research designs. Baseline data generally are collected by observing the 
aspect of the individual's behavior that is under study at several times prior 
to intervention. Since a baseline is used to determine the status of the 
subjects behavior prior to the intervention and to predict what the im- 
mediate future behavior would be if no intervention was implemented, the 
baseline must be long enough to determine the trend in the data. That is, 
the baseline should demonstrate a stable rate, an increasing rate, or a 
decreasing rate of the behavior to be modified. Figure 6—1 provides hy- 
pothetical data showing a stable, an improving (increasing), and a worsening 
(decreasing) rate of appropriate behavior. Since the purpose of the inter- 
vention wouid be to increase the rate of appropriate behavior, only the 


FIGURE 6-1 Hypothetical baseline data for attending behavior. Top panel shows stable, middle panel 
increasing, and bottom panel decreasing trend in the behavior. 
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baseline showing an increasing trend (the middle panel) would present a 
serious problem in evaluating the effectiveness of the intervention. This 
trend would be a problem because the baseline is already showing a trend 
in the desired direction. 

A baseline must include a minimum of three separate observations 
but will often include from five to eight, or even more, observations. The 
length of the baseline is determined by a number of factors. Ideally for 
research purposes, the baseline will continue until a stable trend, with a 
minimum of variability, is established. However, ethical considerations may 
shorten the baseline from the optimal to the minimum acceptable. For 
instance, the researcher working on correcting the self-abusive behavior of 


Manipulating Variables 


A fundamental principle of any type of research, particularly true of single- 
subject research, is that only one variable should be manipulated, or changed, 
at any given time. When two or more variables are manipulated during 
the same phase of a single-subject study, the effect of each cannot be 
separated. 

For instance, in dealing with a hyperactive child, a researcher might 
want to study the effects of medication and of Operant conditioning. To 
do such a study properly, the research should follow the baseline with one 
of the interventions or treatments, let's say the medication. After a period 
of time with the medication, the treatment should be removed and the 
baseline repeated. Following the second baseline, the researcher would 
introduce the second intervention, operant conditioning, followed by a 


the effectiveness of the two interventions. (Ideally, two subjects should be 
used, with the order of treatments reversed, so as to control for any possible 
order effect.) This design would be an A-B-A-B-A design (“A” represents 
baseline or no intervention and “B” represents an intervention). 

If the researcher in the above study had introduced both treatments, 
medication and operant conditioning, at the same time, with a baseline 
before and after, the relative effect of each treatment would not be dis- 
cernible. While the design looks appropriate on the surface, an A-B-A 
design (baseline, intervention, baseline), the manipulation of two variables 
in the same phase, makes it uninterpretable. 


Length of Phases 


When considering the individual length of phases independently of other 
factors (e.g., time limitations, ethical considerations, relative length of phases), 
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most experimenters would agree that baseline and experimental conditions 
should be continued until some semblance of stability in the data is apparent. 
(Barlow & Hersen, 1984, p. 96)* 


That is, the data collection in each phase should continue until there is no 
upward or downward trend and a fairly constant level of variability between 
data collection points. This would obviously result in the phases of a typical 
study— baseline, intervention, baseline, intervention (A-B-A-B)— being 
radically different in length (Barlow & Hersen, 1984). 

On the other hand, Barlow and Hersen (1973) have pointed out 
problems with having unequal phase and “cited the advantages of obtaining 
a relatively equal number of data points for each phase" (Barlow & Hersen, 
1984, p. 96). Obviously, some compromises must be made between these 
two often competing ideals, stability of each phase and equal phase length. 
For instance, in some cases it may be necessary for the first intervention 
to be longer than the initial baseline in order to demonstrate a behavioral 
change. In such a case, the subsequent phases, second baseline and inter- 
vention, should be the same length as the first intervention in order to 
replicate the changes in behavior. “Where possible, the relative equivalence 
of phase lengths is desirable” (Barlow & Hersen, 1984, p. 97). 

A potential problem, that is sometimes related to the length of the 
intervention phase, is a carryover effect. A carryover effect is found when 
the effect of the intervention continues into the next phase, withdrawal. 
The purpose of the withdrawal phase is to support the effectiveness of the 
intervention by demonstrating that the effect disappears (or is at least 
reduced) when the treatment is removed. In the typical A-B-A-B design, 
the treatment is then reintroduced and the effect reappears, thus clearly 
demonstrating the effectiveness of the treatment. If the intervention effect 
carries over to the withdrawal phase (second baseline), there are plausible 
alternate hypotheses for the behavioral improvement that occurred during 
the intervention phase (e.g., maturation, history, etc.). i 

Bijou, Peterson, Harris, Allen, and Johnston (1969) recommend short 
interventions to prevent carryover effects “since long ones might allow 
enough time for the establishment of new conditioned reinforcers” (p. 202). 
Thus, once an effect has been demonstrated, the withdrawal phase should 
be introduced right away. Barlow and Hersen (1984) suggest alternating 
treatment designs (discussed later in this chapter) and counterbalancing 
procedures as ways to prevent carryover effects from obscuring the results. 


Transfer of Training and Response Maintenance 


Transfer of training to other situations, settings, or behaviors is of obvious 
importance in applied behavior analysis. If a teacher eliminates an unde- 
sirable behavior in his or her classroom but the behavior continues else- 


?AII quotes from Barlow and Hersen used with the permission of Pergamon Books Ltd. 
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where, the instructional program has limited success. Barlow and Hersen 
(1984) and Kazdin (1982) suggest a number of design options that are 
useful in providing for, and studying, the transfer of intervention effects. 

Similarly, keeping the undesirable behavior from recurring as soon 
as the reinforcement schedule is eliminated or changed is also relevant. A 
child must learn to behave acceptably without receiving tangible reinforce- 
ments for the rest of his or her life. Thus, maintenance of positive behav- 
ioral responses or of the elimination of undesirable responses is a prime 
purpose of the practitioner. Various reinforcement schedules result in more 
or less maintenance of the intervention effect. The reader should consult 
one of several fine texts (e.g., Alberto & Troutman, 1986; Cooper, Heron, 
& Heward, 1987; Sulzer-Azaroff & Mayer, 1977) for a detailed discussion 
of reinforcement schedules and response maintenance. Barlow and Hersen 
(1984) describe design strategies that also are useful in studying and ef- 
fecting response maintenance. 


ASSESSMENT 


Assessment of the effect of the intervention(s) in single-subject research is 
usually accomplished by observing the behaviors under study. Chapter 7 
includes a section on the use of observation as.a method of data collection. 
However, the assessment of behavioral change is so central to the issue of 
single-subject research that certain aspects, primarily relevant to this topic, 

\ will be briefly described here. The texts by Barlow and Hersen (1984) and 
Kazdin (1982) contain a great deal more detail than can be covered in this 
introductory treatment. 


Target Behavior 


The target behavior or focus of the research is usually determined by the 
research or real problem. If the problem involves the elimination of in- 
appropriate (e.g., violent, disruptive) behaviors in the classroom, then the 
target behaviors will obviously be the inappropriate behaviors displayed. 
The researcher may need to observe the situation for a period of time 
prior to implementing the study, in order to determine the precise nature 
of the behaviors (e.g., hitting other children, calling out, throwing spitballs, 
etc.). s 

Once the researcher fully understands the behavior(s) to be changed, 
the target behavior needs to be operationally defined. The definition should 
only refer to observable aspects of the behavior. Avoid references to intent 
or other unobservable components. The definition should be clearly worded 
for easy, unambiguous, nonsubjective understanding. The definition also 
needs to completely define the outer boundaries of the behavior under 
study (Barlow & Hersen, 1984). 
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The purpose of assessing the target behavior is 


[first to determine] the extent to which the target behavior is performed 
before the program [intervention] begins. The rate of preprogram behavior 
is referred to as the baseline or operant rate. Second, assessment is required to 
reflect behavior change after the intervention is begun. Since the major pur- 
pose of the program is to alter behavior, behavior during the program must 
be compared with behavior during baseline. (Kazdin, 1982, pp. 23—24) 


Data Collection Strategies 


As stated earlier, the major data collection procedure used in single-subject 
research is observation of overt behaviors. There are a number of ways to 
measure such behaviors. 

A frequency measure is simply a count of the number of occurrences of 
the behavior that are observed during a given period of time. If a teacher 
wants to know how frequently a particular student talks without permission, 
he or she may simply count the number of occurrences during a given class 
period. This type of measure is relatively easy and is most useful when the 
occurrences of the behavior are all of about the same length of time. More 
than one behavior are sometimes counted in this procedure (e.g., talking 
to other children and on-task behavior). 

A time-based measure of overt behavior is duration. In this method, 
the actual amount of time, during which the individual performs the be- 
havior, is determined. If an instructional program is designed to teach a 
mentally retarded student to perform an already mastered task more rap- 
idly, the teacher would want a measure of the duration of the task per- 
formance. 

Another time-based measure is time sampling or interval recording. In 
this method, the observation period, such as a class period, is divided into 
brief observation/nonobservation intervals. In a study designed to decrease 
inappropriate behavior, the observer might observe the child every 30 
seconds for a 15-second interval followed by a 15-second nonobservation 
period for recording the observed behaviors. This method is frequently 
used but is considered to have serious flaws (Barlow & Hersen, 1984). 

The final method to be described here is real-time observation. In this 
procedure, behaviors are recorded in their actual frequency, duration, and 
order. This is an excellent method, but it is rarely used because of the need 
for expensive recording equipment. 

The strategies mentioned thus far are useful for overt behavior. For 
research on behaviors that are not overt, other measures are needed. For 
a study on weight reduction, the data might include a count of calories 
consumed and of distance walked in a day. These data could be the totals 
for each day using a calorie counter and a pedometer. These types of 
measures are called response-specific measures by Kazdin (1982). Other types 
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of measures used in single-subject research include psychophysiological 
(e.g., pulse, skin temperature) and seif-reports. 

In single-subject research, the researcher must be able to demonstrate 
the reliability and validity of the measures used. For instance, do two ob- 
servers count the same overt behavior in the same way? What of the effect 
of the observer's presence on the person(s) being observed? These issues 
are addressed in Chapter 7. 


BASIC DESIGNS 


There are two fundamental types of designs that are used in single-subject 
research, A-B-A and multiple baseline. Each of these design types will be 
described and an example of each will be presented. Other designs that 
are too complex for this elementary discussion will be mentioned and texts 
suggested for those interested in more detail. 


A-B-A Designs 


As with all single-subject designs, “A” represents a series of baseline meas- 
urements and “B” represents a series of measurements occurring during 
the treatment. Thus, A-B-A includes three phases, baseline, intervention, 
and withdrawal (baseline), each of which represents a series of measure- 
ments. Most research studies of this type are more complex than the most 
basic A-B-A design. More often than not, the intervention is reintroduced 
after the withdrawal phase resulting in an A-B-A-B design. While additional 
baselines and/or treatment phases may be added, further complicating the 
design, the most common of these designs is the A-B-A-B. 

The A-B-A-B design is analogous to the equivalent time-samples de- 
sign described in Chapter 5. The primary difference is that the A-B-A-B 
design assumes continuous measurement of the behavior(s) being studied 
and the analysis of individual subjects' data. The equivalent time-samples 
design may include continuous measurement or finite times for the meas- 
urement, and the data are analyzed for the group of subjects. 

The A-B-A-B design permits a careful examination of the effects of 
intervention. Kazdin (1982) puts it quite well: 


The ABAB design examines the effects of an intervention by alternating the 
baseline condition (A phase), when no intervention is in effect, with the 
intervention condition (B phase). The A and B phases are repeated again to 
complete the four phases. The effects of the intervention are clear if per- 
formance improves during the first intervention phase, reverts to or ap- 
proaches original baseline levels of performance when treatment is with- 
drawn, and improves when treatment is reinstated in the second intervention 
phase. (p. 110) 
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FIGURE 6-2 Hypothetical data showing an effective intervention for increasing appropriate behavior in an 
A-B-A-B design. 


A critical aspect of the expected events described above is that the direction 
of the behavior changes each time the intervention is introduced or with- 
drawn. Thus, the actual behavior differs from what would have been ex- 
pected if the conditions were not changed. Figure 6—2 shows what the 
graph of such data might look like. Clearly the intervention was effective 
in this hypothetical example. 

Fantuzzo and Clement (1982) used an A-B-A-B design to study the 
effect of the reinforcement given to one student upon other students. While 
the study included a number of conditions and subjects, for the purposes 
of this discussion we will concentrate on just one aspect of the study. In 
this situation, “Al” was to reinforce himself every 60 seconds if he was 
attending to his assigned task. “Ed” was able to observe Al and to behave 
similarly. At the end of each session Al was able to select edible rewards 
based on the number of points he had awarded himself. Ed was not offered 
the edible reward regardless of his behavior or the number of points he 
awarded himself. The actual percentages of attentive behavior for Al and 
Ed.are given in Figure 6—3. As can be seen, the treatment was effective 
with both ‘students even though only Al received the edibles. The with- 
drawal and second intervention were also successful in effecting behavior 
in the directions expected. Each time the conditions changed, A to B, B to 
A, and A to B again, the direction of the behaviors changed. Thus, the 
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tuzzo & Clements [1981] 
with permission of the 
authors. Copyright © by 
the Society for the Ex- 
perimental Analysis of 
Behavior, Inc.) A B A B 


eo el Coates 


= 
eo 
e 


Percentage of 
"Attending' Intervals 


eo 


z 
x 


study demonstrated successful generalization of reinforcement from one 
student to another. 


Multiple Baseline Designs 


The designs described in this section are quite different from the A-B-A 
designs just considered. In A-B-A designs, the intervention effect is dem- 
onstrated by withdrawal and, usually, reintroduction of the intervention. 
In multiple baseline designs, the intervention effect is demonstrated by 
having more than one baseline. Here each baseline represents a different 
person, setting, or behavior which are the three principal variations of this 
type of design. The subsequent baselines (e.g., for the second and third 
behaviors) are longer than the previous baselines and extend into the pre- 
vious ones' interventions. Figure 6—4 provides an example, using hypo- 
thetical data, of a typical multiple baseline design with three subjects. As 
can be seen, each subject shows improvement only after the intervention 
is introduced to that subject. 

Multiple baseline designs actually are replication designs. If each sub- 
ject or behavior shows the same pattern of response to the treatment, only 
when the treatment is applied to that subject or behavior, there is strong 
evidence of a true treatment effect. By extending the second subjects 
baseline until after an intervention effect is demonstrated for the first 
subject, the reseacher controls for maturation, history, and other threats 
to the internal validity of the study. In addition, by demonstrating the 
treatment effect with more than one subject, the researcher demonstrates 
generalizability to other subjects. Likewise, multiple baseline designs that 
use multiple behaviors or multiple settings also control for various threats 
. to internal validity and demonstrate generalizability of the treatment to 

other behaviors or settings. Y 
: McGee, Krantz, and McClannahan (1986) studied the effect of a par- 
ticular teaching approach, incidental teaching, on the learning of sight 
words by an autistic child. They used a multiple baseline design across three 
sets of words and added a follow-up phase to check for longer-term effects. 
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FIGURE 6-4 Hypothetical data showing an effective intervention to increase appropriate behavior using a 
multiple baseline design. 


As can be seen in Figure 6—5, for each set of words, the percent of correct 
responses began to improve only when the treatment was implemented on 
that set of words. The follow-up at 15 and 25 days also indicated retention 
of the learned material. Clearly the treatment was effective and threats to 
the internal validity of the study were well controlled. 


Other Designs 


In addition to A-B-A and multiple baseline designs, a number of additional 
options are available to the researcher. Alternating treatment or multiple treat- 
ment designs permit the researcher to compare two or more treatments 
while controlling for possible order effects. In these designs, the researcher 
alternates treatments for each session or randomly assigns the sessions to 
each treatment. In the first case, with two treatments, the researcher would 
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FIGURE 6-5 Percent of correct responses on acquisition probes during baseline incidental teaching and 
at 15- and 25-day follow-ups. (Adapted from McGee, Krantz, & McClannahan [1986] with 
permission of the authors. Copyright © the Society for the Experimental Analysis of Behavior, 
Inc.) 


simply use treatment 1 in the first postbaseline session, treatment 2 in the 
second, and so on. With random assignment, the researcher decides on the 
number of intervention sessions and randomly assigns each session to a 
treatment. Thus, with two treatments and ten sessions, the order of treat- 
ment might be 1-1-2-1-2-2-1-2-2-1. These procedures permit a clear com- 
parison of two, or more, treatments. Those wishing more details regarding 
this type of design should refer to Barlow and Hersen (1984) or Kazdin 
(1982). 

Researchers also combine the various designs discussed into even more 
intricate designs. Kazdin (1982) describes a number of options for doing 
this. Barlow and Hersen (1984) suggest the use of replication in applied 
research and describe a number of strategies for carrying out replications 


appropriately. 
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EVALUATING DATA 


In studies comparing the performances of two or more groups of subjects, 
a statistical test of thé differences between the groups is the usual method 
used for analyzing the effects of the experimental condition. In single- 
subject research, however, statistical analyses are rarely used. Visual in- 
spection of the data is the method most commonly used to evaluate the 
effect of the treatment in single-subject studies.” 

In single-subject designs, the approach is to see if the effect is repli- 


* cated at the appropriate point. In an A-B-A-B design, the effect should 


replicate at the beginning of each new phase, the change from A to B, 
from B to A, and from A to B again. In a multiple baseline design, the 
effect should replicate across subjects, behaviors, or settings by occurring 
at each point that the treatment is applied (Kazdin, 1982). 

Visual inspection is relatively easy in cases where.there are major 
changes in the behavior. For instance, if the behavior never occurs during 
baseline and occurs frequently during the intervention, an effect is obvious. 
However, this is not the usual case so we must have predetermined char- 
acteristics of the data to use in evaluating whether an effect occurred. 
Kazdin (1982) suggests two types of change, magnitude and rate, that can 
be judged. He further suggests using changes in the average rate of per- 
formance and in the level at the change point to assess the magnitude of 
the change. The average rate of performance is simply the number of 
occurrences divided by the number of sessions. A line can be superimposed 
on the graph of the data to show any changes. A change in the level refers 
"to the shift or discontinuity of performance from the end of one phase 
to the beginning of the next phase" (Kazdin, 1981, p. 234). 

Kazdin (1982) proposes to use changes in trend-and latency to assess 
changes in the rate of the behavior under study. The trend of the data can 
be measured by the slope and is the "tendency for the data to show sys- 
tematic increases or decreases over time” (p. 235) or to show no change at 
all (preferable for baseline data). The latency of the changes refers to how 
quickly the change occurs after beginning the intervention or withdrawal 
phase. Obviously, the more rapidly a change occurs, the better evidence 
for the treatment having caused the change. 

Thus, in evaluating a subject’s data, the researcher looks to see if the 
average performance changes between phases, if a shift in the rate of the 
behavior occurs between the phases, if the slopes of the data lines are in 
different directions for the different phases, and how quickly a change 
occurs after the intervention or withdrawal is introduced. These and other 
characteristics discussed by Kazdin (1982) are used to determine if the 
treatment was effective in changing behavior. 


SUMMARY 


EXERCISES 
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Single-subject experimental research differs from other experimental re- 
search in that the focus of the research is on the individual rather than a 
group. The procedures used in single-subject research are just as rigorous 
as in other types of experimental research. Single-subject research is used 
to test hypotheses. In general, this type of research is used to test the 
hypothesis that a particular treatment will have an overt effect on one or 
more behaviors. Because most research on behavior modification has used 
single-subject research methods, the two are often confused and thought 
to be the same. While single-subject methodology is appropriate and useful 
in research on behavior modification, it is also appropriate and used for 
other research topics. 

This chapter has emphasized the need to collect data repeatedly and 
carefully. The most commonly used method to collect data in this type of 
research is observation. Thus, the method of observation, also described 
in Chapter 7, was considered in some detail here. 

The need for baseline data and the careful manipulation of variables 
were described. Assessment of the effects of a single-subject study depend 
upon having carefully collected baseline and treatment data. The length 
of these phases should be kept as similar as possible. 

Two fundamental designs, A-B-A and multiple baseline, were de- 
scribed in detail and an actual study of each was presented. A-B-A designs 
usually include a second intervention, A-B-A-B, and are sometimes re- 
ferred to as withdrawal designs. Multiple baseline designs include two or 
more replications across persons, behaviors, or settings. The baseline for 
later replications are longer than the earlier ones, thereby controlling for 
threats to the internal validity of such studies. 

The data in single-subject research is usually evaluated through visual 
inspection. Statistical andlysis is rare. Visual inspection considers such fac- 
tors as changes in the magnitude and rate of the behaviors being studied. 


` 


1. What distinguishes single-subject research from other forms of experimental 
research? V 

2. Single-subject research is similar to certain quasi-experimental designs. Dis- 
cuss these similarities and how they are dissimilar. 

3. Why is single-subject research confused with behavior modification? In what 
ways are they different? A 

4. Whatis a baseline? How does the initial baseline differ from a subsequent one 
in an A-B-A design? 
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5. A researcher's baseline stabilizes after four sessions and she begins the in- 
tervention. If there is no effect after four sessions, should she continue the 
treatment or reintroduce the baseline? Why? 

6. Most single-subject studies are of overt behavior. What other types of research 
might use single-subject methods? 

7. How does an A-B-A-B design control for threats to the internal validity of the 
study? 

8. How does a multiple baseline design control for threats to the external validity 
of the study? 

9. Read the report of a single-subject experiment in a journal. 

. What design was used? 

Were the variables clearly defined? 

Was the hypothesis clearly stated? 

Would a group design (Chapter 5) have been better? Why or why not? 

Were the phase lengths appropriate? 

What method was used to collect the data? If observation, how were the 

data recorded? 

How were the data evaluated? Was the evaluation appropriate? 

Were the conclusions clearly stated? 

Were the conclusions substantiated by the data presented? 
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METHODS 
AND TOOLS 
OF RESEARCH 


To carry out any of the types of research investigation described in the 
preceding chapters, data must be gathered with which to test the hypothesis. 
Many different methods and procedures have been developed to aid in 
the acquisition of data. These tools employ distinctive ways of describing 
and quantifying the data. Each is particularly appropriate for certain sources 
of data, yielding information of the kind and in the form that can be most 
effectively used. 

Many writers have argued the superiority of the interview over the 
questionnaire, or the use of the psychological test over the interview. The 
late Arvil S. Barr, University of Wisconsin teacher and researcher, resolved 
discussions of this sort by asking, "Which is better, a hammer or a handsaw?" 
Like the tools in the carpenter's chest, each is appropriate in a given sit- 
uation. 

Some researchers become preoccupied with one method of inquiry 
and neglect the potential of others. Examining the publications of some 
authors shows that many studies use the same method applied to many 
different problems, possibly indicating that the authors have become at- 
tached to one particular method and choose problems that are appropriate 
to its use. 
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There is probably too much dependence upon single methods of 
inquiry. Because each data-gathering procedure or device has its own par- 
ticular weakness or bias, there is merit in using multiple methods, supple- 
menting one with others to counteract bias and generate more adequate 
data. Students of research should familiarize themselves with each of these 


Reliability and validity are essential to the effectiveness of any data-gathering 
procedure. These terms are defined here in the most general way. A more 
detailed discussion is presented later in the chapter. 

Reliability is the degree of consistency that the instrument or procedure 
demonstrates: Whatever it is measuring, it does so consistently. Validity is 
that quality of a data-gatheting instrument or procedure that enables it to 
measure what it is supposed to measure. Reliability is a necessary but not 
sufficient condition for validity. That is, a test must be reliable for it to be 
valid, but a test can be reliable and still not be valid. 


is more difficult to determine these qualities for some other data-gathering 
instruments or procedures, such as observation, interview, or the use of 


A brief consideration of the problems of validity and reliability follows 
the discussion of each type of data-gathering procedure. 


QUANTITATIVE STUDIES 


A nominal scale. A nominal scale is the least precise method of quan- 
tification. A nominal scale describes differences between things by assigning 


a à 
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TABLE 7-1 Academic Rank of Members of the Instructional Staff of Southland 


College 
MALE FEMALE TOTAL 


Professors 20 4 24 
Associate professors 34 22 56 
Assistant professors 44 30 74 
Instructors 26 14 40 
Lecturers icto 5 22 


them to categories—such as professors, associate professors, assistant pro- 
fessors, instructors, or lecturers—and to subsets such as males or females 
(see Table 7—1). 

Nominal data are counted data. Each individual can be a member of 
only one set, and all other members of the set have the same defined 
characteristic. Such categories as nationality, gender, socioeconomic status, 
race, occupation, or religious affiliation provide examples. Nominal scales 
are nonorderable, but in some situations this simple enumeration or count- 
ing is the only feasible method of quantification and may provide an ac- 
ceptable basis for statistical analysis. 


An ordinal scale. Sometimes it is possible to indicate not only that 
things differ but that they differ in amount or degree. Ordinal scales permit 
the ranking of items or individuals from highest to lowest. The criterion 
for highest to lowest ordering is expressed as relative position or rank in 
a group: Ist, 2nd, 3rd, 4th, 5th, . . . nth. Ordinal measures have no absolute 
values, and the real differences between adjacent ranks may not be equal. 
Ranking spaces them equally, though they may not actually be equally 
spaced. The following example illustrates this limitation: 


HEIGHT IN DIFFERENCE 


SUBJECT INCHES IN INCHES RANK 


Jones 76 1st 

Smith 68 8 2nd 

Brown 66 2 3rd 

Porter 59 7 4th 
1 


Taylor 58 Sth 
aa NT EL a rir ALIAE MISI rs MED- 


An interval scale. An arbitrary scale based on equal units of measure- 
ments indicates how much of à given characteristic is present. The differ- 
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ence in amount of the characteristic possessed by persons with scores of 
90 and 91 is assumed to be equivalent to that between persons with scores 
of 60 and 61. 

The interval scale represents a decided advantage over nominal and 
ordinal scales because it indicates the relative amount of a trait or char- 
acteristic. Its primary limitation is the lack of a true zero. It does not have 
the capacity to measure the complete absence of the trait, and a measure 
of 90 does not mean that a person has twice as much of the trait as someone 
with a score of 45. Psychologica] tests and inventories are interval scales 
and have this limitation although they can be added, subtracted, multiplied, 
and divided. 


A ratio scale. A ratio scale has the equal interval properties of an 
interval scale but has two additional features: 


l. The ratio scale has a true zero. It is possible to indicate the complete | 
absence of a property. For example, the zero point on a centimeter i 
scale indicates the complete absence of length or height. 

2. The numerals of the ratio scale have the qualities of real numbers 
and can be added, subtracted, multiplied, and divided and expressed 
in ratio relationships. For example, 5 grams is one-half of 10 grams; y 
15 grams is three times 5 grams; and on a laboratory weighing scale, 
two l-gram weights will balance a 2-gram weight. One of the advan- 
tages enjoyed by practitioners in the physical sciences is the ability to 
describe variables in ratio scale form. The behavioral sciences are 
generally limited to describing variables in interval scale form, a less 
precise type of measurement. 


Proceeding from the nominal scale (the least precise type) to ratio 
scale (the most precise), increasingly relevant information is provided. If | 
the nature of the variables permits, the scale that provides the most precise 
description should be used. 

In behavioral research, many of the qualities or variables of interest 
are abstractions and cannot be observed directly. It is necessary to define 
them in terms of observable acts, from which the existence and amount of 
the variables are inferred. This operational definition tells what the re- 
searcher must do to measure the variable. For example, intelligence is an 
abstract quality that cannot be observed directly. Intelligence may be de- 
fined operationally as scores achieved on a particular intelligence test. 

Operational definitions have limited meaning. Their interpretation 
is somewhat subjective, which may lead experts to disagree about their 
validity. The fact that numerical data are generated does not insure valid 4 
observation and description, for ambiguities and inconsistencies are often 
represented quantitatively. 
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Some behavioral scientists feel that excessive emphasis on quantifi- 
cation may result in the measurement of fragmentary qualities not relevant 
to real behavior. The temptation to imitate the descriptive measures of the 
physical scientist has led some behavioral researchers to focus their atten- 
tion on trivial, easy-to-measure elements of behavior, resulting in preten- 
tious studies of little value. 

The limitations that have been mentioned are not intended to mini- 
mize the significance of quantitative methods. Progress is being made in 
developing more valid operational definitions and better observation tech- 
niques. The quantitative approach is not only useful but may be considered 
indispensable in most types of research. It has played an essential role in 
the history and development of science as it progressed from pure philo- 
sophical speculation to modern empirical, verifiable observation. 


QUALITATIVE STUDIES 


Qualitative studies are those in which the description of observations is not 
ordinarily expressed in quantitative terms. It is not that numerical measures 
are never used but that other means of description are emphasized. For 
example, in the ethnographic studies described in Chapter 4, when the 
researcher gathers data by participant observation, interviews, and the ex- 
amination of documentary materials, little measurement may be involved. 
However, observations may be classified into discrete categories, yielding 
nominal level data. 

Piaget, a scientist who had a distinguished research career of more 
than 50 years, came to the conclusion that a nonquantitative search for 
explanations would be fruitful in the study of human development. His 
qualitative approach, known as genetic epistemology, has suggested another 
method of observing behavior and the nature of human growth and de- 
velopment. He built his logic of operations upon what he observed when 
children of different age levels were confronted with tasks that required 

. reasoning for their solution. 

In some types of investigation, events and.characteristics may appro- 
priately be described qualitatively. Topics such as these may serve as ex- 
amples: "^ 


The Effect of Diplomatic Failure upon a Presidential Election 
Types of Political Leaders and their Influence upon Social Unrest 
The Terrorist as an Agent of Social Change 

The Extended Tribal Family and Its Influence upon Child-rearing 
Practices j : 

The Influence of Tribal Religious Beliefs upon Attitudes toward Death 
and Dying 
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The Influence of the Socioeconomic Status of a Group of High School 
Students in the Selection of Career Goals 
The Influence of the Home on Pupil-Teacher Relationships 


To conclude this discussion on quantitative and qualitative studies, 
several observations may be appropriate. It may be unwise to try to draw 
a hard-and-fast distinction between qualitative and quantitative studies. The 
difference is not absolute; it is one of emphasis. One emphasis should not 
be considered superior to the other. The appropriate approach would 
depend upon the nature of the variables under consideration and the 
objectives of the researchers. 

Traditionally, educational research has emphasized the quantitative 
approach. A substantial number of researchers feel that qualitative studies 
have, for too long, remained outside the mainstream of educational re- 
search. Some investigations could be strengthened by supplementing one 
approach with the other. 


PSYCHOLOGICAL TESTS AND INVENTORIES 


As data-gathering devices, psychological tests are among the most useful 
tools of educational research, for they provide the data for most experi- 
mental and descriptive studies in education. Because here we are able to 
examine only limited aspects of the nature of psychological testing, students 
of educational research should consult other volumes for a more complete 
discussion (such as Anastasia, 1982; Cronbach; 1984). 

A psychological test is an instrument designed to describe and measure 
a sample of certain aspects of human behavior. Tests may be used to 
compare the behavior of two or more persons at a particular time or of 
one or more persons at different times. Psychological tests yield objective 
and standardized descriptions of behavior, quantified by numerical scores. 
Under ideal conditions, achievement or aptitude tests measure the best 
performance of which individuals are capable. Under ideal conditions, 
inventories attempt to measure typical behavior. Tests and inventories are 
used to describe status (or a prevailing condition at a particular time), to 
measure changes in status produced by modifying factors, or to predict 
future behavior on the basis of present performance. 

In the simple experiment on reading headlines described in the chap- 
ter on experimental research (Chapter 5), test scores were used to equate 
the experimental and control groups, to describe relative skill at this task 
prior to the application of the teaching methods, to measure student gains 
resulting from the application of the experimental and control teaching 
methods, and to evaluate the relative effectiveness of teaching methods. 
This example of classroom experimentation illustrates how experimental 
data may be gathered through the administration of tests. 
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In descriptive research studies, tests are frequently used to describe 
prevailing conditions at a particular time. How does a student compare 
with those of his or her own age or grade in school achievement? How 
does a particular group compare with groups in other schools or cities? 

In school surveys for the past several decades, achievement tests have 
been used extensively in the appraisal of instruction. Because tests yield 
quantitative descriptions or measure, they make possible more precise anal- 
ysis than can be achieved through subjective judgment alone. 

There are many ways of classifying psychological tests. One distinction 
is made between performance tests and’ paper-and-pencil tests. Performance 
tests, usually administered individually, require that the subjects manipulate 
objects or mechanical apparatus while their actions are observed and re- 
corded by the examiner. Paper-and-pencil tests, usually administered in 
groups, require the subjects to mark their response on a prepared sheet. 

Two other classes of tests are power versus timed or speed tests. Power 
tests have no time limit, and the subjects attempt progressively more dif- 
ficult tasks until they are unable to continue successfully. Timed or speed 
tests usually involve the element of power, but in addition, they limit the 
time the subjects have to complete certain tasks. 

Another distinction is that made between nonstandardized, teacher- 
made tests and standardized tests. The test that the classroom teacher con- 
structs is likely to be less expertly designed than that of the professional, 
although it is based upon the best logic and skill that the teacher can 
command and is usually "tailor-made" for a particular group of pupils. 

Which type of test is used depends on the test's intended purpose. 
The standardized test is designed for general use. The items and the total 
scores have been carefully analyzed, and validity and reliability have been 
established by careful statistical controls. Norms have been established based 
upon the performance of many subjects of various ages living in many 
different types of communities and geographic areas. Not only has the 
content of the test been standardized, but the administration and scoring 
have been set in one pattern so that those subsequently taking the tests will 
take them under like conditions. As far.as possible, the interpretation has 
also been standardized. 

Although it would be inaccurate to claim that all standardized tests 
meet optimum standards of excellence, the test authors have attempted to 
make them as sound as possible in the light of the best that is known by 
experts in test construction, administration, and interpretation. 

Nonstandardized or teacher*made tests are designed for use with a 
specific group of persons, Reliability and validity are not usually established. 
However, more practical information may be derived from a teacher-made 

„test than from a standardized one because the test is given to the group 
for whom it was designed and is interpreted by the teacher/test-maker. 

(suy c, Psychological tests may also be classified in terms of their purpose— - 
that is, what types of psychological traits they describe and measure. 
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Achievement Tests 


Achievement tests attempt to measure what an individual has learned— 
his or her present level of performance. Most tests used in schools are 
achievement tests. They are particularly helpful in determining individual 
or group status in academic learning. Achievement test scores are used in 
placing, advancing, or retaining students at particular grade levels. They 
are used in diagnosing strengths and weaknesses and as a basis for awarding 
prizes, scholarships, or degrees. 

Frequently achievement tests scores are used in evaluating the influ- 
ences of courses of study, teachers, teaching methods, and other factors 
considered to be significant in educational practice. In using tests for eval- 
uative purposes, it is important not to generalize beyond the specific ele- 
ments measured. For example, to identify effective teaching exclusively 
with the limited products measured by the ordinary achievement test would 
be to define effective teaching too narrowly. It is essential that researchers 
recognize that the elements of a situation under appraisal need to be eval- 
uated on the basis of a number of criteria, not merely on a few limited 


aspects. 
Aptitude Tests 


Aptitude tests attempt to predict the degree of achievement that may be 
expected from individuals in a particular activity. To the extent that they 
measure past learning, they are similar to achievement tests. To the extent 
that they measure nondeliberate or unplanned learning, they are different. 
Aptitude tests attempt to predict an individual's capacity to acquire im- 
proved performance with additional training. 

Actually, capacity (or aptitude) cannot be measured directly. Aptitude 
can only be inferred on the basis of present performance, particularly in 
areas where there has been no deliberate attempt to teach the behaviors 
to be predicted. £ 

Intelligence is a good example of a trait that cannot be measured 
directly. An. individual's intelligence quotient (IQ) is generally derived from 
comparing his or her current knowledge with a group of persons of equal 
chronological age who were administered the test by the author or the 
author's employees. If a person scores relatively high, average, or low, we 
assume that it is a measure of how effectively a person has profited from 
both formal and informal opportunities for learning. To the extent that 
others have had similar opportunities, we predict an individual's ability for 
future learning. This is a matter of inference rather than of direct meas- 
urement. Because it has proved useful in predicting future achievement, 
particularly in academic pursuits, we consider this concept of intelligence 
measurement a valid application. G Jj 

Aptitude tests have been designed to predict improved performance 


i 
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with further training in many areas. These inferred measurements have 
been applied to mechanical and manipulative skills, musical and artistic 
pursuits, and many professional areas involving many types of predicted 
ability. $ 

In music, for example, ability to remember and discriminate between 
differences in pitch, rhythm pattern, intensity, and timbře seems to be 
closely related to future levels of development in musicianship. Present 
proficiency in these tasks provides a fair predictive index of an individual's 
ability to profit from advanced instruction, particularly when the individual 
has had little formal training in music prior to the test. 

Aptitude tests may be used to divide students into relatively homo- 
geneous groups for instructional purposes, identify students for scholarship 
grants, screen individuals for particular educational programs, or help 
guide individuals into areas where they are most likely to succeed. 

Aptitude tests, particularly those that deal with academic aptitude, 
that are used for purposes of placement and classification have become 
highly controversial, and their use has been prohibited in many commu- 
nities, The fact that some individuals with culturally different backgrounds 
do not score well on these tests has led to charges of discrimination against 
members of minority groups. The case has been made that most of these 
tests do not accurately predict academic achievement because their contents 
are culturally biased. Efforts are being made to develop culture-free tests 
that eliminate this undesirable quality. However, it is extremely difficult to 
eliminate culture totally and develop one test that is equally fair for all. 


Interest Inventories 


Interest inventories attempt to yield a measure of the types of activities 
that an individual has a tendency to like and to choose, One kind of in- 
strument has compared the subject’s pattern of interest to the interest 
patterns of successful practitioners in a number of vocational fields. A 
distinctive pattern has been discovered to be characteristic of each field. 
The assumption is that an individual is happiest and most successful work- 
ing in a field most like his or her own measured profile of interest. 

Another inventory is based on the correlation between a number of 
activities from the areas of school, recreation, and work. These related 
activities have been identified by careful analysis with mechanical, com- 
putational, scientific, persuasive, artistic, literary, musical, social service, 
and clerical areas of interest. By sorting the subject's stated likes and dislikes 
into various interest areas, a percentile score for each area is obtained. It 
is then assumed that the subject will find his or her area of greatest interest 
where the percentile scores are relatively high. 

Interest blanks or inventories are examples of self-report instruments 
in which individuals note their own likes and dislikes. These self-report 
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instruments are really standardized interviews in which the subjects, through 
introspection, indicate feelings that may be interpreted in terms of what is 
known about interest patterns. 


, Personality Inventories 
Personality scales are usually self-report ihstruments. The individual checks 
responses to certain questions or statements. These instruments yield scores 
which are assumed or have been shown to measure certain personality 
traits or tendencies. 

Because of individuals' inability or unwillingness to report their own 
reactions accurately or objectively, these instruments may be of limited 
value. Part of this limitation may be due to the inadequate theories of 
personality upon which some of these inventories have been based. At 
best, they provide data that are useful in suggesting the need for further 
analysis. Some have reasonable empirical validity with particular groups 
of individuals but prove to be invalid when applied to others. For 
example, one personality inventory has proven valuable in yielding scores 
that correlate highly with the diagnoses of psychiatrists in clinical situa- 
tions. But when applied to college students, its diagnostic value has proved 
disappointing. 

The development of instruments of personality description and meas- 
urement is relatively recent, and it is likely that continued research in this 
important area will yield better theories of personality and better instru- 
ments for describing and measuring its various aspects. 

The Mooney Problems Check List (1941) is an inventory to be used 
by students in reporting their own problems of adjustment. The subjects 
are asked to indicate on the checklist the things that trouble them. From 
a list of these items, classified into different categories, a picture of the 
students' problems, from their own viewpoint, is drawn. Although the most 
useful interpretation may result from an item analysis of personal problems, 
the device does yield a quantitative score which may indicate the degree 
of difficulty that students feel they are experiencing in their adjustment. 
This instrument has been used as a research device to identify and describe 
the nature of the problems facing individuals and groups of individuals in 
a school. 

The tendency to withhold embarrassing responses and to express 
those that are socially acceptable, emotional involvement of individuals with 
their own problems, lack of insight—all these limit the effectiveness of 
personal and social-adjustment scales. Some psychologists believe that the 
projective type of instrument offers greater promise, for these devices 
attempt to disguise their purpose so completely that the subject does not 
know how to appear in the best light. 
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Projective Devices 


A projective instrument enables subjects to project their internal feelings, 
attitudes, needs, values, or wishes to an external object. Thus the subjects 
may unconsciously reveal themsélves as they react to the external object. 
The use of projective devices is particularly helpful in counteracting the 
tendency of subjects to try to appear in their best light, to respond as they 
believe they should. 

Projection may be accomplished through a number of techniques: 


1. Association. The respondent is asked to indicate what he or she sees, 
feels, or thinks when presented with a picture, cartoon, ink blot, word, 
or phrase. The Thematic Apperception Test, the Rorschach Ink Blot 
Test, and various word-association tests are familiar examples. 

2. Completion. The respondent is asked to complete an incomplete sen- 
tence or task. A sentence-completion instrument may include such 
items as: 

My greatest ambition is 
My greatest fear is 
I most enjoy 
I dream a great deal about 
I get very angry when 
~ If I could do anything I wanted it would be to 

3. Role-playing. Subjects are asked to improvise or act out a situation in 
which they have been assigned various roles. The researcher may 
observe such traits as hostility, frustration, dominance, sympathy, in- 
security, prejudice—or the absence of such traits. 

4. Creative or constructive. Permitting subjects to model clay, finger paint, 
play with dolls, play with toys, or draw or write imaginative stories 
about assigned situations may be revealing. The choice of color, form, 
words, the sense of orderliness, evidence of tensions, and other re- 
actions may provide opportunities to infer deep-seated feelings. 


QUALITIES OF A GOOD TEST OR INVENTORY 


Reliability 


A test is reliable to the extent that it measures whatever it is measuring 
consistently. In tests that have a high coefficient of reliability, errors of 
measurement have been reduced to a minimum. Reliable tests are stable 
in whatever they measure and yield comparable scores üpon repeated 
administration. An unreliable test is comparable to a stretchable rubber 
yardstick that yields different measurements each time it is applied. 
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The reliability or stability of a test is usually expressed as a correlation 


coefficient. There are a number of types of reliability: 


D 


Stability over time (test-retest). The scores on a test will be highly 
correlated with scores on a second administration of the test to the 
same subjects at a later date if the test has good test-retest reliability. 
Stability over item samples (equivalent or parallel forms). Some tests 
have two or more forms that may be used interchangeably. In these 
cases, the scores on a test will be highly correlated with scores on an 
alternative form of the test (for example, scores on form A will be 
highly correlated with scores on form B) if the test has this type of 
reliability. 

Stability of items (internal consistency). Scores on certain test items 

will be highly correlated with scores on other test items. There are 

two methods of measuring for internal consistency. 

a. Split halves. This can be accomplished in two different ways. Scores 
on the odd-numbered items can be correlated with the scores on 
the even-numbered items. Second, on some but not most tests, 
the scores on the first half of the test can be correlated with scores 
on the second half of the test. Because the correlations that would 
result from the above splits would be for only half a test, and 
because generally the longer a test is, the more internal consistency 
it has, the correlation coefficient is modified using the Spearman- 
Brown formula. 

b. Kuder-Richardson formula. This formula is a mathematical test 
that results in the average correlation of all possible split half 
correlations (Cronbach, 1951). 

Stability over scorers (inter-scorer reliability). Certain types of tests, 

in particular projective tests, leave a good deal to the judgment of the 

person scoring the test. Scorer reliability can be determined by having 
two persons independently score the same set of test papers and then 
calculating a correlation between their scores, determined by the scores. 

Standard error of measurement. This statistic permits the interpre- 

tation of individual scores obtained on a test. Because tests are not 

perfectly reliable, we know that the score an individual receives on a 

given test is not necessarily a true measure of his or her trait. The 

standard error of measurement tells us how much we can expect an 
obtained score to differ from the individual's true score. 


The reliability of a test may be raised by increasing the number of 


items of equal quality to the other items. Carefully designed directions for 
the administration of the test with no variation from group to group, 
providing an atmosphere free from distractions and one that minimizes 


Methods and Tools of Research 171 


boredom and fatigue, will also improve the reliability of the testing instru- 
ment. 


Validity 


In general, a test is valid if it measures what it claims to measure. Validity 
can also be thought of as utility. For the tester's particular purpose, is the 
test useful? There are several types of validity, and different types of tests 
and uses of tests need different types of validity. Content validity refers to 
the degree to which the test actually measures, or is specifically related to, 
the traits for which it was designed. It shows how adequately the test samples 
the universe of knowledge and skills that a student is expected to master. 
Content validity is based upon careful examination of course textbooks, 
syllabi, objectives, and the judgments of subject matter specialists. The 
criterion of content validity is often assessed by a panel of experts in the 
field who judge its adequacy, but there is no numerical way to express it. 
Content validity is particularly important for achievement tests but not very 
important for aptitude tests. 

Construct validity is the degree to which scores on a test can be ac- 
counted for by the explanatory constructs of a sound theory. If one were 
to study such a construct as dominance, one would hypothesize that people 
who have this characteristic will perform differently from those who do 
not. Theories can be built describing how dominant people behave in a 
distinctive way. If this is done, dominant people can be identified by ob- 
servation of their behavior, rating or classifying them in terms of the theory. 
A device could then be designed to have construct validity to the degree 
that instrument scores are systematically related to the judgments made by 
observation of behavior identified by the theory as dominant. Intelligence 
tests also require adequate construct.validity. Because different tests are 
based on different theories, each test should be shown to measure what 
the appropriate theory defines as intelligence. Construct validity is partic- 
ularly important for personality and aptitude tests. 

Criterion-related validity is a broad term that actually refers to two dif- 
ferent types of validity with different time frames. 


l. Predictive validity refers to the usefulness of a test in predicting some 
future performance, such as the usefulness of the high school Scho- 
lastic Aptitude Test in predicting college grade-point averages. If a 
test is designed to pick out good candidates for appointment as shop 
foremen, and test scores show a high positive correlation with later 
actual success on the job, the test has a high degree of predictive 
validity, whatever factors it actually measures. It predicts well. It serves 
a useful purpose. : 

But before a test can be evaluated on the basis of predictive 
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validity, success on the job must be accurately described and measured. 
The criteria of the production of the department, the judgment of 
supervisors, or measures of employee morale might serve as evidence. 
Because these criteria might not be entirely satisfactory, however, 
predictive validity is not easy to assess. It is often difficult to discover 
whether the faults of prediction lie in the test, in the criteria of success, 
or both. 


2. Concurrent validity refers to the usefulness of a test in closely relating 
to other measures, such as present academic grades, teacher ratings, 
or scores on another test of known validity. 

Tests are oftén validated by comparing their results with a test 
of known validity. A well-known scale of personal adjustment, the 
Minnesota Multiphasic Personality Inventory, required sorting nearly 
500 cards into three categories, yes, no, and cannot say. The equipment 
was expensive, and it could not be easily administered to large groups 
at the same time. A paper-and-pencil form was devised, using the 
simple process of checking responses to printed items on a form. 
This form could be administered to a large group at one time 
and then scored by machine. all with little expense. The results 
were so similar to the more ume-consuming expensive card-sorting 
process, that the latter has been almost completely replaced. This is 
the process of establishing concurrent validity; in this case, by comparing 
an expensive individual device with an easy-to-administer group in- 
strument. 

In like manner, performance tests have been validated against 
paper-and-pencil tests, and short tests against longer tests. Through 
this process, more convenient and more appropriate tests can be de- 
vised to accomplish the measurement of behavior more effectively. 


Criterion-related validity is expressed as the coefficient of correlation 
between test scores and some measure of future performance, or between 
test scores and scores on another test or measure of known validity. The 
subject of correlation is explained in detail in Chapter 8. 

A test may be reliable even though it is not valid. However, in order 
for a test to be valid, it must also be reliable. That is, a test can consistently 
measure (reliability) nothing of interest (be invalid), but if a test measures 
what it is designed to measure (validity), it must do so consistently (reliably). 


Economy 1 


"Tests that can be given in a short period of time are likely to gain the 
cooperation of the subject and to conserve the time of all those involved 
in test administration, The matter of expense of administering a test is 
often a significant factor if the testing program is being operated on a 
limited budget. 
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Ease of administration, scoring, and interpretation is an important 
factor in selecting a test, particularly when expert personnel or an adequate 
budget are not available. Many good tests are easily and effectively admin- 
istered, scored, and interpreted by the classroom teacher, who may not be 
an expert. 


Interest 


Tests that are interesting and enjoyable help to gain the cooperation of 
the subject. Those that are dull or seem silly may discourage or antagonize 
the subject. Under these unfavorable conditions, the test is not likely to 
yield useful results. 

In selecting a test, it is important to recognize that a good test does 
not necessarily possess all the desirable qualities for all subjects on all levels 
of performance. Within a certain range of age, maturity, or ability, a test 
may be suitable. For other individuals outside that range, the test may be 
quite unsatisfactory and a more appropriate one needed. 

The selection should be made after careful examination of the stand- 
ardizing data contained in the test manual and extensive analysis of pub- 
lished evaluations of the instrument. Research workers should select the 
most appropriate standardized tests available. Detailed reports of their 
usefulness and limitations are usually supplied in the manual furnished by 
the publisher. The considered judgments of outside experts are also avail- 
able. Mitchell (1985) Mental Measurements Yearbook, the best single reference 
on psychological tests, contains many critical evaluations of published tests, 
each contributed by an expert in the field of psychological measurement. 
Usually, several different evaluations are included for each test. Because 
the reports are not duplicated from one volume to another, it is advisable 
to consult Tests in Print (Mitchell, 1983) or previous Yearbooks for additional 
reports not included in the current volume. In addition to the reviews and 
evaluations, the names of test publishers, prices, forms, and appropriate 
uses are included. Readers are also urged to consult the listings and reviews 
of newly published psychological tests in the Journal of Educational Meas- 
urement. 

When psychological tests are used in educational research, one should 
remember that standardized test scores are only approximate measures of 
the traits under consideration. This limitation is inevitable and may be 
ascribed to a number of possible factors: 


l. Errors inherent in any psychological test — no test is completely valid 
or reliable 

2. Errors that result from poor test conditions, inexpert or careless 
administration or scoring of the test, or faulty tabulation of test scores 

3. Inexpert interpretation of test results 

4. The choice of an inappropriate test for the specific purpose in mind. 
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OBSERVATION 


From the earliest history of scientific activity, observation has been the 
prevailing method of inquiry. Observation of natural phenomena, aided 
by systematic classification and measurement, led to the development of 
theories and laws of nature's forces. Observation continues to characterize 
all research: experimental, descriptive, and historical. The use of the tech- 
nique of participant observation in ethnological research was described in 
Chapter 4. The importance of observational techniques for single-subject 
research, and some aspects of the methodology involved in using them, 
were discussed in Chapter 6. 

A reason why observation is most often used in single-subject exper- 
imental research is that it is very costly to observe a sufficient sample of 
behavior for a large number of subjects. Observation must occur during a 
number of baseline and intervention sessions in this type of research. In a 
study described in Chapter 6, Fantuzzo and Clement (1981) observed the | 
attending behavior of their subjects. This is an example of the type of } 
observation technique known as time sampling (see Chapter 6 for a descrip- 
tion). Every 60 seconds, the subjects were observed to see if they were 
attending to their task. 

In Chapter 5, a study by Hall, et al. (1973) was used as an example 
of an equivalent time-samples design. Observation was used to collect the 
data in this study, also. The observers counted the number of occurrences 
of aggressive behavior, the technique known as frequency count (described 
in Chapter 6). ; | 

In experimental research, observation is most frequently the method 
of choice for behavior modification studies that frequently use single-sub- 

ject research designs (e.g., Fantuzzo & Clement, 1981). It is rare to see 
observation used in group designs (those described in Chapter 5), unfor- 
tunately more because of the cost than because it is less appropriate than 
the other measures used in its place. 

As a data-gathering device, direct observation may also make an im- 
portant contribution to descriptive research. Certain types of information 
can best be obtained through direct examination by the researcher. When j 
the information concerns aspects of material objects or specimens, the 
process is relatively simple, and may consist of classifying, measuring, or 
counting. But when the process involves the study of a human subject in 
action, it is much more complex. 

One may study the characteristics of a school building by observing 
and recording such aspects as materials of construction, number of rooms 
for various purposes, size of rooms, amount of furniture and equipment, 
presence or absence of certain facilities, and other relevant aspects. Ade- 
quacy could then be determined by comparing these facilities with reason- | 
able standards previously determined by expert judgment and research. 

Tn university athletic departments or professional football organiza- 
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tions, observation has been used effectively to scout the performance of 
opposing football teams. Careful observation and recording of the skills 
and procedures of both team and individual players are made, and defenses 
and offenses are planned to cope with them. What formations or patterns 
of attack or defense are employed? Who carries the ball? Who does the 
passing, and where and with what hand does he pass? Who are the likely 
receivers, and how do they pivot and cut? 

During a game a coaching assistant may sit high in the stands, relaying 
strategic observations by phone to the coach on the bench. At the same 
time, every minute of play is being recorded on film for careful study by 
the coaching staff and players. Who missed his tackle when that play went 
through for 20 yards? Who missed his block when play number two lost 6 
yards? Careful study of these films provides valuable data on weaknesses 
to be corrected before the following game. Through the use of binoculars, 
the phone, the motion picture camera, and the video tape recorder, ob- 
servations can be carefully made and recorded. 

Although this example may seem inappropriate in a discussion of 
observation as a research technique, improving the performance of a foot- 
ball team is not altogether different from analyzing learning behavior in a 
classroom. The difference is one of degree of complexity. The objectives 
of the football team are more concretely identifiable than are the more 
complex purposes of the classroom. Yet some of the procedures of obser- 
vation so effective in football coaching may also be systematically employed 
in studying classroom performance. In some schools, teachers make short 
periodic classroom or playground observations of pupil behavior, which 
are filed in the cumulative folder. These recorded observations, known as 
anecdotal reports, may provide useful data for research studies. 

Laboratory experimentation seeks to describe action or behavior that 
will take place under carefully arranged and controlled conditions. But 
many important aspects of human behavior cannot be observed under the 
contrived conditions of the laboratory. Educational research seeks to de- 
scribe behavior under less rigid controls and more natural conditions. The 
behavior of children in a classroom situation cannot be effectively analyzed 
by observing their behavior in a laboratory. It is necessary to observe what 
they actually do in a real classroom. 

This does not suggest that observation is haphazard or unplanned. 
On the contrary, observation as a research technique must always be sys- 
tematic, directed by a specific purpose, carefully focused, and thoroughly 
recorded. Like other research procedures, it must be subject to the usual 
checks for accuracy, validity, and reliability. 

The observer, must know just what to look for. He or she must be 
able to distinguish between the significant and insignificant aspects of the 
situation. Of course, objectivity is essential, and careful and accurate meth- 
ods of measuring and recording must be employed. 

Because human behavior is complex, and many important traits and 
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characteristics are difficult or impossible to observe directly, they must be 
carefully defined in precise operational form. Perhaps a subject's interest 
can be operationally defined by the number of.times a student volunteers 
to participate in discussion by raising his or her hand within a time sample 
period. Lack of concentration during a study period can be operationally 
defined by the number of times the student looks around, talks to another 
student, fiddles with a book, pen, or paper, or engages in other distracting 
acts within a time sample period. These examples of operational definitions 
may be unsatisfactory, but they do illustrate the kinds of behavior that can 
be directly observed. 

Behaviors that might mean different things to different observers 
must also be carefully defined. Acting-out behavior may mean very dis- 
ruptive acts such as figliting or, at the other extreme, any behavior for 
which the child did not first obtain permission, such as sharpening a pencil. 
In defining which behaviors meet the meaning of acting out, the researcher 
would need first to determine the class rules to avoid labeling permissible 
behavior as "acting out." 

Instruments such as the stopwatch, mechanical counter, camera, au- 
diometer, audio and videotape recordings, and other devices make possible 
observations that are more precise than mere sense observations. Having 
a permanent record on videotape also permits the researcher to start and 
stop the action for more accurate recording of data (especially. when more 
than one subject is to be observed), to collect interobserver reliability data 
(see next section) without having two or more observers at the observation 
site, and to reexamine his or her ideas and decide on a new format for 
coding behaviors. Where feasible, we recommend the video recording of 
the behaviors under study. 

Systematic observation of human behavior in natural settings (e.g., 
classrooms) is to some degree an intrusion into the dynamics of the situ- 
ation. This intrusion may be reactive, that is, affect the behavior of the 
person(s) being observed. These potential confounding effects cannot be 
ignored. It'is widely believed that individuals do not behave naturally when 
they know that they are being observed. The situation may become too 
artificial, too unnatural, to provide for a valid series of observations. 

Concealing the observer has been used to minimize this reactive effect. 
Cameras and one-way screens were used by Gesell (1948) to make unob- 
trusive observations of infant bellavior. One-way glass and concealed mi- 


b crophones and videotape ‘recorders have been used' in observing the be- 
“havior of children in natural groüp activities so that the observers could 


see and hear without being seen and heard. 5 
Some authorities believe that the presence of an outside observer in 


© the classroom over a period of time will be taken for granted, viewed as a 


part of the natüral setting, and have little effect on the behavior observed. 
Others feel that introducing observers as active participants in the activities 


" of the group will minimize the reactive effect more efficiently. 


1 
i 
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Should the participant observers make their purposes known to the 
members of the group observed? Some feel that concealing the intentions 
of the participant observers raises ethical questions of invasion of privacy 
and establishes a false, hypocritical, interpersonal relationship with the 
individuals in the group. Do the ends of science justify the means of de- 
ception? In a society that increasingly questions the ethics of science, this 
issue must be confronted. 


Validity and Rellability of Observation 


For the researcher's observations to achieve a satisfactory degree of content 
validity, the truly significant incidents of behavior must be identified and 
sampled. Supplementing the knowledge and skill of the researcher, the 
judgment of experts in the field may help in selecting a limited number 
of observable incidents whose relationship to the qualities of interest is 
based upon sound, established theories. 

Criterion-related and -construct validity may also be necessary de- 
pending on the purpose of the study and inferences made regarding be- 
haviors. For instance, if certain behaviors were considered to be evidence 
of a person being shy, construct validity is needed to demonstrate a rela- 
tionship between the behaviors and the underlying construct. 

The reactive effect of the intrusion of the observer as a threat to the 
reliability of the process has been mentioned. In addition, when researchers 
are sole observers, they unconsciously tend to see what they expect to see 
and to overlook those incidents thàt do not fit their theory. Their own 
values, feelings, and attitudes, based upon past experience, may distort 
their observations. It may be desirable to engage others who are then well- 
prepared as observers, restricting the researchers’ role to that of interpreter 
of the observations. Kazdin (1982) recommends that the researcher not be 
the observer. To further reduce the possibility of bias, the observers should 
be kept as ignorant as possible regarding the purposes and hypotheses of 
the study. This is called a blind. If the persons being observed also are 
unaware that they are participants in an experiment, thereby reducing the 
chances of a placebo effect, this becomes a double-blind. 

Independent observers should be prepared by participation in 


l. The development of the procedures for observing and recording 
observations 

2. The try-out or dry run phase of the procedure 

‘3. The critique of the results of the try-out phase. 


If more than one observer is necessary (as is usually the case), reliability 
among the observers should be demonstrated. This is done by having each 
participant observe with at least one other participant for a period of time 
and compare their recorded observations. Percent of agreement among 
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observers should be quite high (usually 90% or higher) if the observations 
are to be considered reliable. High interobserver reliability is most likely 
when the behaviors to be observed are well defined and the observers well 
trained. 


Recording Observations 


If it does not distract or create a barrier between observer and those ob- 
served, simultaneous recording of observations is recommended. This prac- 
tice minimizes the errors that result from faulty memory. There are other 
occasions when recording would more appropriately be done after obser- 
vation. The recording of observations should be done as soon as possible, 
while the details are still fresh in the mind of the observer. But many 
authorities agree that objectivity is more likely when the interpretation of 
the meaning of the behavior described is deferred until a later time, for 
simultaneous recording and interpretation often interfere with objectivity. 
Obviously, a video record permits later recording and coding of the ob- 
served behaviors. 


Systematizing Data Collection 


Tó aid in the recording of information gained through observation, a | 
number of devices have been extensively used. Ghecklists, rating scales, 1 
score cards, and scaled specimens provide systematic means of summarizing | 
or quantifying data collected by observation or examination. 


Checklist | 


| 
The checklist, the simplest of the devices, is a prepared list of behaviors | 
or items. The presence or absence of the behavior may be indicated by | 
checking yes or no, or the type or number of items may be indicated by 
inserting the appropriate word or number. This simple "laundry-list" type 
of device systematizes and facilitates the recording of observations and helps 
to ensure the consideration of the important aspects of the object or act 
observed. Readers are familiar with checklists prepared to help buyers 
purchase a used car, choose a home site, or buy an insurance policy, which | 
indicate characteristics or features that one should bear in mind before | 
making a decision. Appendix G illustrates a checklist of this type for the | 
evaluation of a research report. i ! | 
Checklists also can be used to count the number of behaviors occurring 
in a given time period. In Chapter 6, we described a study by Fantuzzo 
„and Clement (1981) in which they observed whether each child was attentive 
“every 60 seconds during a class period. They most likely used a checklist 
to mark, and later count, the number of times each child was and was not 
, attending to the task. 
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Rating Scale 


The rating scale involves qualitative description of a limited number of 
aspects of a thing or of traits of a person. The classifications may be set 
up in five to seven categories in such terms as: 


l. superior above average average fair inferior 
2. excellent good average below average poor 
3. always frequently occasionally rarely never 


Another procedure establishes positions in terms of behavioral or 
situational descriptions. These statements may be much more specific and 
enable the judge to identify more clearly the characteristic to be rated. 
Instead of deciding whether the individual's leadership qualities are su- 
perior or above average, it may be easier to decide between "Always exerts 
a strong influence on his associates," and "Sometimes is able to move others 
to action." 

One of the problems of constructing a rating scale is conveying to the 
rater exactly which quality one wishes evaluated. It is likely that a brief 
behavioral statement is more objective than an adjective that may have no 
universal meaning in the abstract. For this to be considered an effective 
method in observational research, the traits and categories must be very 
carefully defined in observable (behavioral) terms. 

Rating scales have several limitations. In addition to the difficulty of 
clearly defining the trait or characteristic to be evaluated, the halo effect 
causes raters to carry qualitative judgment from one aspect to another. 
Thus there is a tendency to rate a person who has a pleasing personality 
high on other traits such as intelligence or professional interest. This halo 
effect is likely to appear when the rater is asked to rate many factors, on 
a number of which he has no evidence for judgment. This suggests the 
advisability of keeping at a minimum the number of characteristics to be 
rated. 

Another limitation of rating is the raters' tendency to be too generous. 
A number of studies have verified the tendency to rate 60 to 80 percent 
of an unselected group above average in all traits. Rating scales should 
carry the suggestion that raters omit the rating of characteristics that they 
have had no opportunity to observe. 


Score Card 


The score card, similar in some respects to both the checklist and the rating 
scale, usually provides for the appraisal of a relatively large number of 
aspects. In addition, the presence of each characteristic or aspect, or the 
rating assigned to each, has a predetermined point value. Thus the score- 
card rating may yield a total weighted score that can be used in the eval- 
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uation of the object observed. Score cards are frequently used in evaluating 
communities, building sites, schools, or textbooks. Accrediting agencies 
sometimes use the score card in arriving at an overall evaluation of a school. 
Score cards have been designed to help estimate the socioeconomic 
status of a family. Such aspects as type of neighborhood, home ownership, 
number of rooms, ownership of a piano, number of books in the library, 
number and type of periodicals subscribed to, presence of a telephone, 
occupations of parents, and organizational membership of the adults are 
all considered significant and have appropriate point values assigned. 
The limitations of the score card are similar to those of the rating 
scale. In addition to the difficulty of choosing, identifying, and quantifying 
the significant aspects of the factor to be observed, there is the suspicion 
that the whole of a thing may be greater than the sum of its parts. 
Colleges and universities are frequently evaluated in terms of such 
elements as size of endowment, proportion of faculty members holding the 
earned doctoral degree, pupil-teacher ratio, and number of volumes in the 
library. Although these aspects are important, the effectiveness of an in- 
stitution may not be accurately appraised by their summation, for certain 
important intangibles do not lend themselves to score-card ratings. 


The Scaled Specimen 


The scale specimen, although not frequently encountered in behavioral 
research, provides a method for evaluating certain observed levels of per- 
formance or measures of a quality in question. Testing a solution for acidity 
in a chemistry laboratory involves a pH test. A drop of color indicator is 
introduced into a sample of the solution. The resulting color of the solution 
is matched with the color of one of a set of display vials, indicating the 
percentage of acidity in the solution. 

One of the early scaled specimens developed in the field of education 
was the handwriting scale developed by Thorndike. From a large sample 
of handwriting exhibits taken at different ages and grade levels, norms 
were established. The handwriting to be evaluated was then matched with 
the exhibit sample, yielding a measure of handwriting quality. 

The Goodenough-Harris Drawing Test (Harris, 1963) provides a 71- 
point scale with examples for comparing various details of a child's drawing 
ofa man, a woman, or a self-portrait. Each point is scored + or 0, indicating 
the presence or absence of a part of body detail in the figure drawn. The 
total of + scores is equated with separate age norms, established for boys 
and girls. The scale is based on the assumption that as individuals mature 
intellectually, they perceive greater detail in the human figure that they 
reveal in their drawings. Variations of the test include Draw a Man, Draw 
a Woman, and Draw Yourself. Studies have reported correlations as high as 
+.60 to +.72 with the Stanford-Binet Intelligence Scale. 
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Characteristics of Good Observation 


Observation, as a research data-gathering process, demands rigorous ad- 
herence to the spirit of scientific inquiry. The following standards should 
characterize observers and their observations: 

Observation is carefully planned, systematic, and perceptive. Observ- 
ers know what they are looking for and what is irrelevant in a situation. 
They are not distracted by the dramatic or the spectacular. 

Observers are aware of the wholeness of what is observed. Although. 
they are alert to significant details, they know that the whole is often greater 
than the sum of its parts. 

Observers are objective. They recognize their likely biases, and they 
strive to eliminate their influence upon what they see and report. 

Observers separate the facts from the interpretation of the facts. They 
observe the facts and make their interpretation at a later time. 

Observations are checked and verified, whenever possible by repeti- 
tion, or by comparison with those of other competent observers. 

Observations are carefully and expertly recorded. Observers use ap- 
propriate instruments to systematize, quantify, and preserve the results of 
their observations. 

Observations are collected in such a way as to make sure that they 
are valid and reliable. 


INQUIRY FORMS: THE QUESTIONNAIRE 


The general category of inquiry forms includes data-gathering instruments 
through which respondents answer questions or respond to statements in 
writing. A questionnaire is used when factual information is desired. When 
opinions rather than facts are desired, an opinionnaire or attitude scale is 
used. 

Quiestionnaires administered personally to groups of individuals have 
a number of advantages. The person administering the instrument has an 
opportunity to establish rapport, explain the purpose of the study, and 
explain the meaning of items that may not be clear. That availability of a 
number of respondents in one place makes possible an economy of time 
and expense and provides a high proportion of usable responses. It is likely 
that a principal would get completely usable responses from teachers in 
the building, or a teacher from students in the classroom. However, in- 
dividuals who have the desired information cannot always be contacted 
personally without the expenditure of a great deal of time and money in 
travel. It is in such situations that the mailed questionnaire may be useful. 
The mailed questionnaire is one of the most used and probably most crit- 
icized data-gathering device. It has been referred to as the lazy person's 
way of gaining information, although the careful preparation of a good 
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questionnaire takes a great deal of time, ingenuity, and hard work. There 
is little doubt that the poorly constructed questionnaires that flood the 
mails have created a certain amount of contempt. This is particularly true 
when the accompanying letter pleads that the sender needs the information 
to complete the requirements for a graduate course, thesis, or dissertation. 
The recipient's reaction may be, “Why should I go to all this trouble to 
help this person get a degree?" 

Filling out lengthy questionnaires takes a great deal of time and effort, 
a favor that few senders have any right to expect of strangers. The unfa- 
vorable reaction is intensified when the questionnaire is long, the subject 
trivial, the items vaguely worded, and the form poorly organized. The poor 
quality of so many mailed questionnaires helps to explain why so small a 
proportion is returned. As a result of low response rates, often less than 
40 percent, the data obtained are often of limited validity. The information 
in the unreturned questionnaires might have changed the results of the 
investigation materially. The very fact of no response might imply certain 
types of reactions, reactions that can never be included in the summary of 
data. 

Unless one is dealing with a group of respondents who have a genuine 
interest in the problem under investigation, know the sender, or have some 
common bond of loyalty to a sponsoring institution or organization, the 
rate of returns is frequently disappointing and provides a flimsy basis for 
generalization. 

Although the foregoing discussion may seem to discredit the ques- 
tionnaire as a respectable research technique, we have tried to consider the 
abuse or misuse of the device. Actually the questionnaire has unique ad- 
vantages, and properly constructed and administered, it may serve as a 
most APEE Rna and useful data-gathering device in a research project. 


The Closed Form 


Questionnaires that call for short, check-mark responses are known as the 
restricted or closed-form type. Here you mark a yes or no, write a short re- 
sponse, or check an item from a list of suggested responses. The following 
example illustrates the closed-form item: 


Why did you choose to do your graduate work at this university? Kindly 
indicate three reasons in order of importance, using the number 1 for the 
most important, 2 for the second most important, and 3 for the third most 
important. 


(a) Convenience of transportation 
(b) Advice of a friend 
(c) Reputation of institution 


HE 


Methods and Tools of Research 183 


RANK 
(d) Expense factor Er tait 
(e) Scholarship aid ei um 
(f) Other pi 


(kindly indicáte) 


Even when using the closed form, it is well to provide for unanticipated 
response. Providing an "other" category permits respondents to indicate 
what might be their most important reason, one that the questionnaire 
builder had not anticipated. Note the instruction to rank choices in order 
of importance, which enables the tabulator to properly classify all responses. 

For certain types of information the closed-form questionnaire is en- 
tirely satisfactory. It is easy to fill out, takes little time, keeps the respondent 
on the subject, is relatively objective, and is fairly easy to tabulate and 
analyze. 


The Open Form 


The open-form or unrestricted questionnaire calls for a free response in the 
respondent's own words. The following open-form item seeks the same 
type of information as did the closed-form item: 


Why did you choose to do your graduate work at this university? 


Note that no clues are given. The open form probably provides for 
greater depth of response. The respondents reveal their frame of reference 
and possibly the reasons for their responses. But because it requires greater 
effort on the part of the respondents, returns are often meager. Also, the 
open-form item can sometimes be difficult to interpret, tabulate, and sum- 
marize in the research report. 

Many questionnaires include both open- and closed-type items. Each 
type has its merits and limitations, and the questionnaire builder must 
decide which type is more likely to supply the information wanted. 


Improving Questionnaire Items 


Inexperienced questionnaire makers are likely to be naive about the clarity 
of their questions. One author of this book recalls a brilliant graduate 
student who submitted a questionnaire for his approval. She was somewhat 
irritated by his subsequent questions and suggestions, remarking that any- 
one with any degree of intelligence should know what she meant. At the 
advisor's suggestion, she duplicated some copies and personally adminis- 
tered the questionnaire to a graduate class in research. 

She was swamped with questions of interpretation, many of which she 
could not answer clearly. There was considerable evidence of confusion 
about what she wanted to know. After she had collected the completed 
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copies and had tried to tabulate the responses, she began to see the ques- 
tionnaire's faults. Even her directions and explanation in class had failed 
to clarify the ambiguous intent of her questionnaire. Her second version 
was much improved. 

Many beginning researchers are not really sure what they want to 
know. They use a shotgun approach, attempting to cover their field broadly 
in the hope that some of the responses will provide the answers for which 
they are groping. Unless researchers know exactly what they want, however, 
they are not likely to ask the right questions or to phrase them properly. 

In addition to the problem of knowing what one wants, there is the 
difficulty of wording the questionnaire clearly. The limitations of words 
are particular hazards in the questionnaire. The same words mean different 
things to different people. After all, even questionnaire makers have their 
own interpretation, and the respondents may have many different inter- 
pretations. In the interview, as in conversation, we are able to clear up 
misunderstandings by restating our question, by inflection of the voice, by 
suggestions, and by a number of other devices. But the written question 
stands by itself, often ambiguous and misunderstood. 

A simple example illustrates the influence of voice inflection alone. 
Consider the following question. Read it over, each time emphasizing the 
underlined word, noting how the change in inflection alters the meaning. 


Were you there last night? 
Were you there last night? 
Were you there last night? 
Were you there /ast night? 
Were you there last night? 


Questionnaire makers must depend on written language alone. Ob- 
viously they cannot be too careful in phrasing questions to insure their 
clarity of purpose. Although there are no certain ways of producing fool- 
proof questions, certain principles can be employed to make questionnaire 
items more precise. A few are suggested here with the hope that students 
constructing questionnaires and opinionnaires will become critical of their 
first efforts and strive to make each item as clear as possible. 


Define or qualify terms that could easily be misinterpreted. 
What is the value of your house? 
The meaning of the term value is not clear. It could imply several 


different meanings: the assessed value for tax purposes, what it would sell 
for on the present market, what you would be willing to sell it for, what it 
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would cost to replace, or what you paid for it. These values may differ 
considerably. It is essential to frame questions specifically, such as, "What 
is the present market value of your house?" 

As simple a term as age is often misunderstood. When is an individual 
twenty-one? Most people would say that a person is twenty-one from the 
day of the twenty-first birthday until the day of the twenty-second. But an 
insurance company considers a person twenty-one from age twenty and 
six months until age twenty-one and six months. Perhaps this question 
could be clarified by asking age to nearest birthday or date of birth. 

Hundreds of words are ambiguous because of their many interpre- 
tations. One has only to think of such words and phrases as curriculum, 
democracy, progressive eduction, cooperation, and integration —and even such 
simple words as how much and now. To the question, “What work are you 
doing now?" the respondent might be tempted to answer, "Filling out your 
foolish questionnaire." 


Be careful in using descriptive adjectives and adverbs that have no agreed- . 
upon meaning. This fault is frequently found in rating scales as well as in 
questionnaires. Frequently, occasionally, and rarely do not have the same 
meanings to different persons (Hakel, 1968). One respondent's occasionally 
may be another's rarely. Perhaps a stated frequency — times per week or times 
per month — would make this classification more precise. 


Beware of double negatives. Underline negatives for clarity. 


Are you opposed to not requiring students to take showers after gym 
class? 

Federal aid should not be granted to those states in which education 
is not equal regardless of race, creed, or color. 


Be careful of inadequate alternatives. 


No 


Married? Yes 


Does this question refer to present or former marital status? How 
would the person answer who is widowed, separated, or divorced? 


How late at night do you permit your children to watch television? 


There may be no established family policy. If there is a policy, it may 
differ for children of different ages. It may be different for school nights 
or for Friday and Saturday nights, when watching a late movie may be 
permitted. 
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Avoid the double-barreled question. 


Do you believe that gifted students should be placed in separate groups 
for instructional purposes and assigned to special schools? 


One might agree on the advisability of separate groups for instruc- 
tional purposes but be very much opposed to the assignment of gifted 
students to special schools. Two separate questions are needed. 


Underline a word if you wish to indicate special emphasis. 


A parent should not be told his child’s IQ score. 
Should all schools offer a modern foreign language? 


When asking for ratings or comparisons, a point of reference is necessary. 


How would you rate this student teacher's classroom teaching? 
Superior Average Below average 


With whom is the student teacher to be compared—an experienced 
teacher, other student teachers, former student teachers —or should the 
criterion be what a student teacher is expected to be able to do? 

Avoid unwanted assumptions. 


Are you satisfied with the salary raise that you received last year? 


A no answer might mean either I did not get a raise or that I did get 
a raise but am not satisfied. 


Do you feel that you benefited from the spankings that you received 
as a child? 


A no response could mean either that the spankings did not help me, 
or that my parents did not administer corporal punishment. These un- 
warranted assumptions are nearly as bad as the classic, “Have you stopped 
beating your wife?” 

Phrase questions so that they are appropriate for all respondents. 


What is your monthly teaching salary? 


Some teachers are paid on a nine-month basis, some on ten, some on 
eleven, and some on twelve. Three questions would be needed. 
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Your salary per month? 
Number of months in school term? 
Number of salary payments per year? 


Design questions that will give a complete response. 


Do you read the Indianapolis Star? Yes No 


A yes or no answer would not reveal much information about the 
reading habits of the respondent. The question might be followed with an 
additional item, as in Figure 7—1. 


Provide for the systematic quantification of responses. The type of question 
that asks respondents to check a number of items from a list is difficult to 
summarize, especially if not all respondents check the same number. One 
solution is to ask respondents to rank, in order of preference, a specific 
number of responses. 


What are your favorite television programs? Rank in order of pref- 
erence your first, second, third, fourth, and fifth choices. 


FIGURE 7-1 Sample questionnaire item. 


If your answer is Yes, kindly check how often and what sections of the Star you read. 


National and inter- 
national news 


State and local news 


Advertising 


Syndicated features 


Other (specify) 
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The items can then be tabulated by inverse weightings. 


Ist choice 5 points 
2nd choice — 4 points 
3rd choice — 3 points 
4th choice — 2 points 
5th choice. 1 point 


The relative popularity of the programs could be described for a 
group in terms of total weighted scores, the most popular having the largest 
total. 

Consider the possibility of classifying the responses yourself, rather than having 
the respondent choose categories. 1f students were asked to classify their fath- 
er's occupation in one of the following categories, the results might be quite 
unsatisfactory, 


Unskilled labor 
Skilled labor 
Clerical work 
Managerial work 
Profession 
Proprietorship 


It is likely that by asking the children one or two short questions about 
their father’s work, it could be classified more accurately, 


l. At what place does your father work? 
2. What kind of work does he do? 

Very often, a researcher wants to gather information (facts) and at- 
titudes (opinions). This allows later analyses that can determine if attitudes 
are related to personal characteristics such as age, sex, or race. Figure 
7-2 is ah example of just such a combination. This questionnaire/opinion- 
naire collects information about the individual and then asks for the opinion 
of the person regarding factors that contribute to teacher morale. 


FIGURE 7-2 Teacher morale questionnaire-opinionnaire. 


Male _____ Female 


. Marital status: single - married divorced/separated ___ 
. Number of dependent children _____; their ages 


E 

2. 

3 

4 

5. Number of other dependents ____ 
6. 

7. 

8 


. Highest degree held. — — 
. Years of teaching experience 
. Years of teaching at present school ___ 
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intermediate upper grades -Jr. 


9. Teaching level; primary 
H. 


10. Enrollment of your school ____. 
11. Your average class size ___ 


12. Population of your community or school district 


female 


13. Your principal is: male 


In the following questions kindly check the appropriate column: 


a. excellent b. good c. fair d. poor 
14. How does your salary schedule com- 
pare with those of similar school 
districts? 
15. How would you rate your principal 
on these traits? 


16. How would you rate the consulting 
or advisory services that you receive? 


17. Provision made for teacher free time 


18. How would you rate your faculty 
lounge? 

19. How would you rate your faculty 
professional library? 


20. How would you evaluate the 
adequacy of teaching materials and 
supplies? 


21. How would you evaluate the assign- 
ment of your nonteaching duties? 
(leave blank if item does not apply) 


supervision of: 


competence 
friendliness 
helpfulness 
ability to inspire 


encourage creativity 
availability 


relaxation 
preparation 
lunch 
conferences 


books 
periodicals 
references 


textbooks 
references 
AV aids 
supplies 


reports 
meetings 
halls 
lunchroom 
playground 
study hall 

, extra-class 
organizations 


Sr. H.S. __; If secondary, your major teaching area 
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22; 
23; 
24. 


25. 


How would you rate the compatibility of your faculty? 
How would you rate the parent support of your school? 
How would you rate your morale as a teacher? 


Kindly rank in order of importance to you at least five factors that you would 
consider most important in increasing your morale or satisfaction with your 
working conditions: Rank 1, most important, 2 next in importance, etc. 

. higher salary 

. smaller class size 

. more free time 

. more adequate faculty lounge 

. more compatible faculty 

~ more adequate teaching materials 

. more effective principal 

. better consulting services 

. more effective faculty meetings 

. assistance of a teacher aide 

. more attractive classroom/building 

. fewer reports to make out 

. fewer nonteaching duties 

— n. better provision for atypical students 

— 9. more participation in policy making 

— p. fewer committee meetings 

—— q. teaching in a higher socioeconomic area 

—— r. teaching in a lower socioeconomic area 

— s. other (kindly specify) 

On the back of this sheet kindly add any comments that you believe would 


- FS = FR enon ow 


LLLEELELELL LI 


more adequately express your feelings of satisfaction or dissatisfaction with 
teaching. 


FIGURE 7-2 (Concluded) 


CHARACTERISTICS OF A GOOD 
QUESTIONNAIRE 


It deals with a significant topic, one the respondent will recognize as 
important enough to warrant spending his or her time on. The sig- 
nificance should be clearly and carefully stated on the questionnaire, 
or in the letter that accompanies it. 

It seeks only that information which cannot be obtained from other 
sources such as school reports or census data: 

It is as short as possible, and only long enough to get the essential 
data. Long questionnaires frequently find their way into the waste- 
basket. 

It is attractive in appearance, neatly arranged, and clearly duplicated 
or printed. 

Directions for a good questionnaire are clear and complete. Important 
terms are defined. Each question deals with a single idea and is worded 
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as simply and clearly as possible. The categories provide an oppor- 
tunity for easy, accurate, and unambiguous responses. 

6. The questions are objective, with no leading suggestions as to the 
responses desired. Leading questions are just as inappropriate on a 
questionnaire as they are in a court of law. $ 

7. Questions are presented in good psychological order, proceeding from 
general to more specific responses. This order helps respondents to 
organize their own thinking so that their answers are logical and 
objective. It may be well to present questions that create a favorable 
attitude before proceeding to those that may be a bit delicate or in- 
timate. If possible, annoying or embarrassing questions should be 
avoided. 

8. It is easy to tabulate and interpret. It is advisable to preconstruct a 
tabulation sheet, anticipating how the data will be tabulated and in- 
terpreted, before the final form of the questionnaire is decided upon. 
This working backward from a visualization of the final analysis of 
data is an important step for avoiding ambiguity in questionnaire 
form. If computer tabulation is to be used, it is important to designate 
code numbers for all possible responses to permit easy transference 
to a computer program’s format. 


Preparing and Administering the Questionnaire 


Get all the help you can in planning and constructing your questionnaire. 
Study other questionnaires, and submit your items for criticism to other 
members of your class or your faculty, especially those who have had ex- 
perience in questionnaire construction. 

In designing an inquiry form (questionnaire or opinionnaire), it is 
advisable to use a separate card or slip for each item. As the instrument is 
being developed, items can be refined, revised, or replaced by better items 
without recopying the entire instrument. This procedure also provides 
flexibility in arranging items in the most appropriate psychological order 
before the instrument is put into its final form. 

Try out your questionnaire on a few friends and acquaintances. When 
you do this personally, you may find that a number of your items are 
ambiguous. What may seem perfectly clear to you may be confusing to a 
person who does not have the frame of reference that you have gained 
from living with and thinking about.an idea over a long period. It is also 
a good idea to “pilot test” the instrument with a small group of persons 
similar to those who will be used in the study. 

These dry runs will be well worth the time and effort. They may 
reveal defects that can be corrected before the final form is printed and 
committed to the mails. Once the instrument has been sent out, it is too 
late to remedy its defects. 

Choose respondents carefully. It is important that questionnaires be 
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sent only to those who possess the desired information and are likely to be 
sufficiently interested to respond conscientiously and objectively. A prelim- 
inary card, asking whether the individual would be willing to participate 
in the proposed study, is recommended by some research authorities. This 
is not only a courteous approach but a practical way of discovering those 
who will cooperate in furnishing the desired information. 

In a study on questionnaire returns, See (1957) discovered that a 
better return was obtained when the original request was sent to the ad- 
ministrative head of an organization rather than directly to the person who 
had the desired information. It is likely that when a superior officer gives 
a staff member a questionnaire to fill out, there is an implied feeling of 
obligation. 


Getting permission. If the questionnaire is to be used in a public school, 
it is essential that approval of the project be secured from the principal, 
who may then wish to secure approval from the superintendent of schools. 
Schools are understandably sensitive to public relations. One can imagine 
the unfavorable publicity that might result from certain types of studies 
made by individuals not officially designated to conduct the research. School 
officials may also want to prevent the exploitation of teachers and pupils 
by amateur researchers, whose activities require an excessive amount of 
time and effort in activities not related to the purposes of the school. 

Parental permission may also need to be secured. Students should be 
informed that participation is voluntary. Particularly if sensitive questions 
(e.g., about drug use) are to be asked, parental and student consent is 
essential. 

If the desired information is delicate or intimate in nature, consider 
the possibility of providing for anonymous responses. The anonymous 
instrument is most likely to produce objective and honest responses. There 
are occasions, however, for purposes of classification or for a possible fol- 
low-up meeting, when it might be necessary to identify the respondents. 
If identification is needed, it is essential to convince the respondents that 
their responses will be held in strict confidence and that their answers will 
in no way jeopardize the status and security of their position. 

Try to get the aid of sponsorship. Recipients are more likely to answer 
if a person, organization, or institution of prestige has endorsed the project. 
Of course, it is unethical to claim sponsorship unless it has been expressly 
given. 


The cover letter. Be sure to include a courteous, carefully constructed 
cover letter to explain the purpose of the study. The letter should promise 
some sort of inducement to the respondent for compliance with the request. 
Commercial agencies furnish rewards in goods or money. In educational 
circles, a summary of questionnaire results is considered an appropriate 


p ————————————————— 


Methods and Tools of Research 193 


reward, a promise that should be scrupulously honored after the study has 
been completed. 

The cover letter should assure the respondent that all information 
will be held in strict confidence or that the questionnaire is anonymous. 
And the matter of sponsorship might well be mentioned. Of course, a 
stamped, addressed return envelope should be included. To omit this would 
virtually guarantee that many of the questionnaires would go into the 
wastebasket. Some researchers suggest that two copies of the questionnaire 
be sent, one to be returned when completed and the other for the re- 
spondent's own file. 


Follow-up procedures. Recipients are often slow to return completed 
questionnaires. To increase the number of returns, a vigorous follow-up 
procedure may be necessary. A courteous postcard reniinding the recipient 
that the completed questionnaire has not been received may bring in 
some additional responses. This reminder will be effective with those who 
have just put off filling out the document or have forgotten to mail it. A 
further step in the follow-up process may involve a personal letter of re- 
minder. In extreme cases a telegram, phone call, or personal visit may 
bring additional responses. In some cases it may be appropriate to send 
another copy of the questionnaire with the follow-up letter. Hc wever, the 
researcher must know who has already responded so as noi to receive 
potential duplicates. 

It is difficult to estimate, in the abstract, what percentage of ques- 
tionnaire responses is to be considered adequate. The importance of the 
project, the quality of the questionnaire, the care used in selecting recipi- 
ents, the time of year, and many other factors may be significant in deter- 
mining the proportion of responses. In general, the smaller the percentage 
of responses, the smaller the degree of confidence one may place in the 
data collected. Of course, objectivity of reporting requires that the pro- 
portion of responses received should always be included in the research 
report. Babbie (1973) suggests that a response rate of 50 percent is ade- 
quate, 60 percent good, and 70 percent very good. 


Validity and Reliability of Questionnaires 


All too rarely do questionnaire designers deal consciously with the degree 
of validity or reliability of their instrument. Perhaps this is one reason why 
so many questionnaires are lacking in these qualities. It must be recognized, 
however, that questionnaires, unlike psychological tests and inventories, 
have a very limited purpose. They are often one-time data-gathering de- 
vices with a very short life, administered to a limited population. There 
are ways, however, to improve both validity and reliability of questionnaires. 

Basic to the validity of a questionnaire is asking the right questions, 
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phrased in the least ambiguous way. In other words, do the items sample 
a significant aspect of the purpose of the investigation? 

The meaning of all terms must be clearly defined so that they have 
the same meaning to all respondents. Researchers need all the help they 
can get; suggestions from colleagues and experts in the field of inquiry 
may reveal ambiguities that can be removed or items that do not contribute 
to a questionnaire's purpose. The panel of experts may rate the instrument 
in terms of how effectively it samples significant aspects of its purpose, 
providing estimates of content validity. 

It is possible to estimate the predictive validity of some types of ques- 
tionnaires by follow-up observations of respondent behavior at the present 
time or at some time in the future. In some situations, overt behavior can 
be observed without invading the privacy of respondents. A comparison 
of questionnaire responses with voting data on a campus or community 
election may provide a basis for estimating predictive validity. 

Reliability of questionnaires may be inferred by a second administra- 
tion of the instrument, comparing the responses with those of the first. 
Reliability may also be estimated by comparing responses of an alternate 
form with the original form. 


IQUIRY FORMS: THE OPINIONNAIRE 


An information form that attempts to measure the attitude or belief of an 
individual is known as an opinionnaire, or attitude scale. Because the terms 
opinion and attitude are not synonymous, clarification is necessary. 

How people feel, or what they believe, is their attitude. But it is 
difficult, if not impossible, to describe and measure attitude. Researchers 
must depend upon what people say are their beliefs and feelings. This is 
the area of opinion. Through the use of questions, or by getting people's 
expressed reaction to statements, a sample of their opinions is obtained. 
From this statement of opinion, one may infer or estimate their attitude— 
what they really believe. 

Inferring attitude from expressed opinion has many limitations. Peo- 
ple may conceal their attitudes and express socially acceptable opinions. 
They may not really know how they feel about a social issue, never having 
given the idea serious consideration. People may be unaware of their at- 
titude about a situation in the abstract; until confronted with a real situation, 
they may be unable to predict their reaction or behavior. 

Even behavior itself is not always a true indication of attitude. When 
politicians kiss babies, their behavior may not be a true expression of af- 
fection toward infants. Social custom or the desire for social approval makes 
many overt expressions of behavior mere formalities, quite unrelated to 
people's inward feelings. Even though there is no sure method of describing 
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and measuring attitude, the description and measurement of opinion may, 
in many instances, be closely related to people's real feelings or attitudes. 

With these limitations in mind, psychologists and sociologists have 
explored an interesting area of research, basing their data upon people's 
expressed opinions. Several methods have been employed: 


1l. Asking people directly how they feel about a subject. This technique 
may employ a schedule or questionnaire of the open or closed form. 
It may employ the interview process, in which the respondents express 
their opinions orally. 

2. Asking people to check in a list the statements with which they agree. 

3. Asking people to indicate their degree of agreement or disagreement 
with a series of statements about a controversial subject. 

4. Inferring their attitudes from reactions to projective devices, through 
which they may reveal attitudes unconsciously. (A projective device is a 
data-gathering instrument that conceals its purpose so that the sub- 
jects cannot guess how they should respond to appear in their best 
light. Thus their real characteristics are revealed.) 


Three procedures for eliciting opionions and attitudes have been used 
extensively in opinion research, and they warrant a brief description. 


Thurstone Technique 


The first method of attitude assessment is known as the Thurstone Tech- 
nique of Scaled Values (Thurstone & Chave, 1929). A number of state- 
ments, usually twenty or more, are gathered that express various points of 
view toward a group, institution, idea, or practice. They are then submitted 
to a panel of judges, each of whom arranges them in eleven groups ranging 
from one extreme to another in position. This sorting by each judge yields 
a composite position for each of the items. When there has been marked 
disagreement among the judges in assigning a position to an item, that 
item is discarded. For items that are retained, each is given its median scale 
value (see Chapter 8) between one and eleven as established by the panel. 

The list of statements is then given to the subjects, who are asked 
to check the statements with which they agree. The median value of 
the statements that they check establishes their score, or quantifies their 
opinion. 


Likert Method 


The second method —the Likert Method of Summated Ratings— can be 
performed without a panel of judges and has yielded scores very similar 
to those obtained by the Thurstone method. The coefficient of correlation 
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(see Chapter 8) between the two scales was reported as high as +.92 in 
one study (Edwards & Kenney, 1946). Since the Likert-type scale takes 
much less time to construct, it offers an interesting possibility for the student 
of opinion research. 

The first step in constructing a Likert-type scale is to collect a number 
of statements about a subject. The correctness of the statements is not 
important, as long as they express opinions held by a substantial number 
of people. It is important that they express definite favorableness or un- 
favorableness to a particular point of view and that the number of favorable 
and unfavorable statements is approximately equal. 

After the statements have been gathered, a trial test should be ad- 
ministered to a number of subjects. Only those items that correlate with 
the total test should be retained. This testing for internal consistency will 
help to eliminate statements that are ambiguous or that are not of the same 
type as the rest of the scale. 

The attitude or opinion scale may be analyzed in several ways. The 
simplest way to describe opinion is to indicate percentage responses for 
each individual statement. For this type of analysis by item, three re- 
sponses—agree, undecided, and disagree—are preferable to the usual five. 
If a Likert-type scale is used, it may be possible to report percentage re- 
sponses by combining the two outside categories: “strongly agree” and 
“agree”; “disagree” and “strongly disagree.” 


strongly agree undecided disagree 
agree strongly disagree 


For example, 70 percent of the male respondents agree with the 
statement, “Merit rating will tend to encourage conformity and discourage 
initiative.” 

The Likert scaling technique assigns a scale value to each of the five 
responses. Thus the instrument yields a total score for each respondent, 
and a discussion of each individual item, although possible, is not necessary. 
Starting with a particular point of view, all statements favoring the above 
position are scored: 


SCALE VALUE 
a. strongly agree 5 
b. agree 4 
c. undecided 3 
d. disagree 2 { 
e. strongly disagree 1 i 
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For statements opposing this point of view, the items are scored in 
the opposite order: 


SCALE VALUE 
a. strongly agree 1 
b. agree 2 
c. undecided 3 
d. disagree 4 
e. strongly disagree 5 


The opinionnaire illustrated in Figure 7—3 attempts to measure Chris- 
tian religious orthodoxy or conservatism. It is apparent that this type of 
instrument could be used to measure opinion in many controversial areas: 
racial integration, merit rating of teachers, universal military training, and 
many others. The test scores obtained on all the items would then measure 
the respondent's favorableness toward the given point of view. 

Figure 7—4 illustrates an instrument that was used to seek the opinions 
of a group of classroom teachers toward merit rating. ' 

If an opinionnaire consisted of 30 statements or items, the following 
score values would be revealing: 


30 x 5 — 150 Most favorable response possible 
30 x 3 = 90 A neutral attitude 
30 x 1 = 30 Most unfavorable attitude 


li 


The scores for any individual would fall between 30 and 150— above 
90 if opinions tended to be favorable to the given point of view, and below 
90 if opinions tended to be unfavorable. 

It would be wise to conclude this discussion with a recognition of the 
limitations of this type of opinion measure. Obviously it is somewhat inexact 
and fails to measure opinion with the precision one would desire. There 
is no basis for belief that the five positions indicated on the scale are equally 
spaced. The interval between "strongly agree" and "agree" may not be 
equal to the interval between “agree” and “undecided.” It is also unlikely 
that the statements are of equal value in “for-ness” or “against-ness.” It is 
unlikely that the respondent can validly react to a short statement on a 
printed form in the absence of real-life qualifying situations. It is doubtful 
whether equal scores obtained by several individuals indicate equal favor- 
ableness toward the given position: Actually, different combinations of 
positions can yield equal score values without necessarily indicating equiv- 
alent positions of attitude or opinion. And even though the opinionnaire 
provides for anonymous response, there is a possibility that people may 
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The following statements represent opinions, and your agreement or disagree- 
ment will be determined on the basis of your particular beliefs. Kindly check 
your position on the scale as the statement first impresses you. Indicate what 
you believe, rather than what you think you should believe. 


. I strongly agree 

. L agree 

. I am undecided 

I disagree 

. I strongly disagree 


c» 


ean 


1. Heaven does not exist as an actual place or location. __ 
2. God sometimes sets aside natural law, performing mira- 
dene eee 
3. Jesus was born of a virgin, without a human father, L 
4. Hell does not exist as an actual place or location. 
5. The inspiration that resulted in the writing of the Bible 
was no different from that of any other great religious 
literature. —_ LLLLLÉÉÉÉÉ——— 
6. There is a final day of judgment for all who have lived 
mop pine Verc 
7. The devil exists as an actual person. 
8. Prayer directly affects the lives of persons, whether or not 
they know that such prayer has been offered. _____ 
9. There is another life after the end of organic life on earth. 


ee 
10. When on earth, Jesus possessed and used the power to 
restore the dead to life. 
11. God is a cosmic force, rather than an actual person. —— 
12. Prayer does not have the power to change such conditions 
as a drought... 7 = 
13. The creation of the world did mo! literally occur in the 
way described in the Old Testament. 
14. After Jesus was dead and buried, he actually rose from 
the dead, leaving an empty tomb. 
15. Everything in the Bible should be interpreted as literally 
true. 


FIGURE 7-3 A Likert-type opinionnaire. 


answer according to what they think they should feel rather than how they 
do feel. ^ 
Semantic Differential 


The third method of attitude assessment was developed by Osgood, Suci, 
and Tannenbaum (1957). The semantic differential is similar to the Likert 


MERIT RATING OPINIONNAIRE 


Male Female ne 
Teaching level: elementary ____/secondary 


Marital status: single married divorced/separated ___ 
widowed ___ pps 
Years of teaching experience years. 


The following statements represent opinions, and your agreement 
or disagreement will be determined on the basis of your particular convic- 
tions. Kindly check your position on the scale as the statement first im- 
presses you. Indicate what you believe, rather than what you think you 
should believe. 


a. I strongly agree 
b. I agree 

c. I am undecided 

d. I disagree 

e. I strongly disagree 


l. It is possible to determine wliat constitutes 
merit, or effective teaching. 
2. A valid and reliable instrument can be developed 
to measure varying degrees of teaching effec- 
üvénel e 2 nr n etel: 
3. Additional remuneration will no! result in im- 
provediteaching, TENOREM ATI 
4. Merit rating destroys the morale of the teaching 
force by creating jealousy, suspicion, and dis- 
Do CHEAP RIEN AED TUN 
5. Mutual confidence between teachers and admin- 
istrators is impossible if administrators rate 
teachers for salary purposes. 
6. Merit salary schedules will attract more high- 
quality young people to the teaching profession. 


7. Merit salary schedules will hold quality teachers 
in the profession. 
8. Parents will object to having their children 
taught by nonmerit teachers. 
9. Merit rating can be as successful in teaching as 
itis i SORIN. —— Wr cnn 
10. The hidden purpose of merit rating is to hold 
down salaries paid to most teachers by paying 
only a few teachers well. 
11. There is no justification for paying poor teachers 
as well as good teachers are paid..—5 — ^ — 
12. Apple-polishers will profit more than superior 


teachers from merit rating. 5 0 0 0. 
13. Merit rating will encourage conformity and dis- 
courage initiative. 
14. The way to make teaching attractive is to reward 
excellence in the classroom. — 0. 
15. Most administrators do not know enough about 
teaching to rate their faculty members fairly. 


16. Salary schedules based on education and experi- 
ence only encourage mediocre teaching. 


FIGURE 7-4 A Likert-type opinionnaire on merit rating. 
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method in that the respondent indicates an attitude or opinion between 
two extreme choices. This method usually provides the individual with a 
seven-point scale with two adjectives at either end of the scale, such as good- 
bad, unhealthy-healthy, clean-dirty. The respondent is asked to rate a group, 
individual, or object on each of these bipolar scales. 

One author of this book had a student who used the semantic dit- 
ferential method to compare the attitudes of regular teachers and special- 
education teachers toward mentally retarded, learning-disabled, and be- 
havior-disordered children. The results of the semantic differential can be 
graphically displayed as profiles. Figure 7—5 shows a partial profile of the 
regular and special-education teachers when asked about mentally retarded 
children. , 

The semantic differential has limitations similar to those of the Thur 
stone and Likert approaches. In spite of these limitations, however, the 
process of opinion measurement has merit. Until more precise measures 
of attitude are developed, these techniques can serve a useful purpose in 
social research. 


FIGURE 7-5 Semantic profiles for regular class and special class teachers. (Dots represent regular class 
teachers and Xs represent special class teachers.) 


The Average Retarded Child Is: 


Innocent 


Hard 


Strong 


Clean 


Healthy 


Honest Dishonest 


Good Bad 


Moral Immoral 


Fair Unfair 
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THE INTERVIEW 


The interview is in a sense an oral questionnaire. Instead of writing the 
response, the subject or interviewee gives the needed information orally 
and face-to-face. 

With a skillful interviewer, the interview is often superior to other 
data-gathering devices. One reason is that people are usually more willing 
to talk than to write. After the interviewer gains rapport or establishes a 
friendly, secure relationship with the subject, certain types of confidential 
information may be obtained that an individual might be reluctant to put 
in writing. (In order to establish sufficient rapport, however, it may be 
necessary to consider the sex, race, and possibly other characteristics of the 
interviewer in relation to the interviewee. For instance, a woman should 
probably interview rape victims, and a black person should interview other 
blacks regarding instances of discrimination that they have experienced.) ' 

Another advantage of interviewing is that the interviewer can explain 
more explicitly the investigations purpose and just what information he 
or she wants. If the subject misinterprets the question, the interviewer may 
follow it with a clarifying question. At the same time, he or she may evaluate 
the sincerity and insight of the interviewee. It is also possible to seek the 
same information in several ways at various stages of the interview, thus 
checking the truthfulness of the responses. And through the interview 
technique the researcher may stimulate the subjects insight into his or her 
own experiences, thereby exploring significant areas not anticipated in the 
original plan of investigation. 

The interview is also particularly appropriate when dealing with young 
children. If one were to study what junior high school students like and 
dislike in teachers, some sort of written schedule would probably be sat- 
isfactory. But in order to conduct a similar study with first-grade pupils, 
the interview would be the only feasible method of. getting responses. The 
interview is also well suited for illiterates and those with language diffi- 
culties. 

Preparation for the interview is a critical step in the procedure. In- 
terviewers must have a clear conception of just what information they need. 
They must clearly outline the best sequence of questions and stimulating 
comments that will systematically bring out the desired responses. A written 
outline, schedule, or checklist will provide a set plan for the interview, 
precluding the possibility that the interviewer will fail to get important and 
needed data. 

An open-form question, in which the subject is encouraged to answer 
in his or her own words at some length, is likely to provide greater depth 
of response. In fact, this penetration exploits the advantage of the interview 
in getting beneath-the-surface reactions. However, distilling the essence of 
the reaction is difficult, and interviewer bias may be a hazard. The closed- 
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form question (in the pattern of a multiple-choice response) is easier to 
record but may yield more superficial information. 

Leading questions that unconsciously imply a specific answer should 
be avoided. The question, "Do you think that the United Nations has failed 
in its peace-keeping function?" illustrates the danger of eliciting agreement 
to an idea implanted in the question. It would be preferable to phrase it, 
“How effective do you feel the United Nations has been in its peace-keeping 
function?" This form is neutral and does not suggest a particular response. 
A question of this type would appropriately be followed by, "Could you 
explain how you reached this conclusion?" 

The relationship between interviewer and subject requires an ex- 
pertness and sensitivity that might well be called an art. The initial task of 
securing the confidence and cooperation of the subject is crucial. Talking 
in a friendly way about a topic of interest to the subject will often dispel 
hostility or suspicion, and before he or she realizes it, the subject is freely 
giving the desired. information. As in the use of the questionnaire, the 
interviewer must be able to assure the subject that responses will be held 
in strict confidence. When interviews are not tape recorded, it is necessary 
for the interviewer to take written notes, either during the interview or 
immediately thereafter. The actual wording of the responses should be 
retained. It is advisable to make the interpretation later, separating this 
phase of analysis from the actual recording of responses. 

Recording interviews on tape is preferred because they are convenient 
and inexpensive and obviate the necessity of writing during the interview, 
which may be distracting to both interviewer and subject. Interviews re- 
corded on tape may be replayed as often as necessary for complete and 
objective analysis at a later time. In addition to the words, the tone of voice 
and emotional impact of the response are preserved by the tapes. It is 
unethical to record interviews without the knowledge and permission of 
the subject. g 

In order to obtain reliable and objective data, interviewers must be 
carefully trained. This training should include skills in developing rapport, 
asking probing questions, preparing for the interview, and a host of other 
details. The Institute for Social Research at the University of Michigan has 
published an excellent interview-training manual that includes a 90-minute 
audio cassette of a model interview and some exercises (Guenzel, Berkmans, 
& Cannell, 1983). 


Validity and Reliability of the interview 


The key to effective interviewing is establishing rapport. This skill is some- 
what intangible, including both a personality quality and a developed ability. 
Researchers have studied the relationship of interviewer status to the 
achievement of this confidence. Many studies have been conducted in which 
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interviewers of different status have interviewed the same respondents. 
The responses were often significantly different both in how much the 
subject was willing to reveal and in the nature of the attitudes expressed. 

Ethnic origin seems to be important. Interviewers of the same ethnic 
background as their subjects seem to be more successful in establishing 
rapport. When there is an ethnic difference, a certain amcunt of suspicion 
and even resentment may be encountered. The same relationship seems 
to prevail when the social status of the interviewer and respondent is dif- 
ferent. Even the interviewer's clothing may have an inhibiting effect. Younger 
interviewers seem to be more successful than older, particularly when mid- 
dle-aged respondents are involved. Women seem to have a slight advantage 
over men in getting candid responses, although depending on the topic 
(e.g., male impotence), male interviewers might be more successful. Of 
course, experience tends to improve interviewing skill. 

Validity is greater when the interview is based upon a carefully de- 
signed structure, thus ensuring that the significant information is elicited 
(content validity). The critical judgment of experts in the field of inquiry 
is helpful in selecting the essential questions. 

Reliability, or the consistency of response, may be evaluated by re- 
stating a question in slightly different form at a later time in the interview. 
Repeating the interview at another time may provide another estimate of 
the consistency of response. If more than one interviewer is used, the 
researcher must demonstrate reliability of technique and scoring among 
the interviewers. This can be done through observing the interviews and 
having more than one interviewer score each tape or transcript. 

As a data-gathering technique, the interview has unique advantages. 
In areas where human motivation is revealed through actions, feelings, 
and attitudes, the interview can be most effective. In the hands of a skillful 
interviewer, a depth of response is possible that is quite unlikely to be 
achieved through any other means. 

This technique is time-consuming, however, and one of the most 
difficult to employ successfully. The danger of interview bias is constant. 
Because the objectivity, sensitivity, and insight of the interviewer are crucial, 
this procedure is one that requires a level of expertness not ordinarily 
possessed by inexperienced researchers. 


Q METHODOLOGY 


Q methodology, devised by Stephenson (1953), is a technique for scaling 
objects or statements. It is a method of ranking attitudes or judgments 
(similar to the first step in the Thurstone technique) and is particularly 
effective when the number of items to be ranked is large. The procedure 
is known as a Q-sort, in which cards or slips bearing the statements or items 
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are arranged in a series of numbered piles. Usually nine or eleven piles 
are established, representing relative positions on a standard scale. Some 
examples of simple polarized scales are illustrated. 


most important least important 
most approve least approve 
most liberal least liberal 
most favorable least favorable 
most admired least admired 
most like me least like me 


The respondent is asked to place a specified number of items on each 
pile, usually on the basis of an approximately normal or symmetrical dis- 
tribution. From 50 to 100 items should be used. 


MOST LIKE ME LEAST LIKE ME 

Piler 1) 9 7,9: tao Oe UD ICON T9 

% Sy aa AC by deal ae te Ag PAR lef 
Self-concept Q-sort 


Let us assume that a Q-sort has been designed to measure the before-and- 
after therapy status of a subject. A few examples of appropriate traits are 
presented to be placed on the scale. 


afraid ignored discouraged 
suspicious admired energetic 
successful disliked loved 


enthusiastic cheerful hated 
friendly happy stupid 


A change in position of items from before-therapy to after-therapy 
would indicate possible change or improvement in self-esteem. Computing 
the coefficient of correlation between the pile positions of items before and 
after therapy would provide a measure of change. If no change in item 
placement had occurred, the coefficient of correlation would be + 1.00. If 
a completely opposite profile appeared, the coefficient would be — 1.00. 
Although a perfect + 1.00 or — 1.00 coefficient is improbable, a high pos- 
itive coefficient would indicate little change, whereas a high negative coef- 
ficient would indicate significant change. 

Another type of Q-sort solicits the composite judgment of a selected 
panel of experts (in this case, professors of educational research). The 
criterion of judgment involves the relative importance of research concepts 
that should be included in the introductory course in educational research. 
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One hundred slips, each listing a concept, were to be sorted into nine piles, 
ranging from most important to least important. A few of the concepts 
that were considered are listed: 


hypothesis historical method 
probability survey 

dependent variable null hypothesis 
coefficient of correlation preparing a questionnaire 


sources of reference materials deductive method 
preparing the research report descriptive method 


randomization sampling 

post hoc fallacy intervening variables 

experimental method independent variable | 
interviewing Q-sorts 

level of significance standard deviation 

the research proposal nonparametric statistics 

attitude studies action research 


The mean value of the positions assigned to each item indicates the 
composite judgment of the panel as to its relative importance. 

Two applications of the Q-sort technique have been illustrated in our 
simplified discussion. The first attempted to measure change in the attitude 
of an individual toward himself or herself, the second the composite judg- 
ment of a group of individuals. Many types of analysis may be carried on 
in the area of attitudes by the use of Q methodology. Researchers contem- 
plating the use of this technique should carefully consider the theoretical 
assumptions underlying the criteria and the items selected. 


SOCIAL SCALING 


Sociometry 


Sociometry is a technique for describing the social relationships among 
individuals in a group. In an indirect way it attempts to describe attractions 
or repulsions between individuals by asking them to indicate whom they 
would choose or reject in various situations. Children in a school classroom 
may be asked to name in order of preference (usually two or three) the 
child or children that they would invite to a party, eat lunch with, sit next 
to, work on a class project with, or have as a close friend. Although sorne 
researchers object to the method, it is also common to ask the children to 
name the children, again in order of preference, that they would least like 
to invite to a party, eat lunch with, sit next to, and so forth. 

There is an extensive body of sociometric research on classroom groups 
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from kindergarten through college, fraternities and sororities, dormitory 
residents, camp groups, factory and office workers, military combat units, 
and entire communities. The United States Air Force has used sociometry 
to study the nature of leadership in various situations. For example, the 
following question was used in a study of air combat crews: “What member 
of the crew would you select, disregarding rank, as the most effective leader 
if your plane were forced down in a remote and primitive area? Name 
three, in order of your preference.” 


Scoring Sociometric Choices 


One widely used procedure is to count the number of times an individual 
is chosen, disregarding the order of choice. This is the simplest method, 
and it is widely used. The objection has been raised that it is insensitive, 
for it does not distinguish, between a first and third choice. 

Another procedure is to score a first choice three points, a second 
choice two points, and a third choice one point. This plan’s weakness is 
that it suggests that the difference between a third choice and no choice 
at all is identical to the differences between third, second, and first choices. 
This assumption is difficult to defend. 

A third scoring procedure is based upon the concept of the normal 
curve standard score distribution. However, this method is more complex 
and seldom used. 

Once obtained, the scores for each individual in the group can be 
related to such measures as intelligence or other traits that can be measured 
by tests, or to such categories as sex, race, nationality, religious affiliation, 
economic status, birth order, family size, grade-point average, teacher, 
employer, or other characteristics that may be of interest to the researcher. 


The Sociogram 


Sociometric choices may be represented graphically on a chart known as a 
sociogram. There are many versions of the sociogram pattern, and the reader 
is urged to consult specialized references on sociometry. A few observations 
will illustrate the nature of the sociogram. 

In consulting a sociogram, boys may be represented by triangles and 
girls by circles. A choice may be represented by a single-pointed arrow, a 
mutual choice by an arrow pointing in opposite directions. Those chosen 
most often are referred to as stars, those not chosen by others as isolates. 
Small groups made up of individuals who choose one another are cliques. 

Identifying numbers are placed within the symbols. Numbers of those 
chosen most often are placed nearest the center of the diagram, and num- | 
bers of those chosen less often are placed further outward. Those not 
chosen are, literally, on the outside (see Figure 7—6). Remember, however, 


Methods and Tools of Research 207 


FIGURE 7-6 Sociogram Showing first and second choices in a third grade class. 


that relationships among individuals in a group are changeable. Children's 
choices are most temporary, for stability tends to develop only with age. 

Students of group relationships and classroom teachers may construct 
a number of sociograms over a period of time to measure changes that 
may have resulted from efforts to bring isolates into closer group relation- 
ships or to transform cliques into more general group membership. The 
effectiveness of socializing or status-building procedures can thus be meas- 
ured by the changes revealed in the sociogram. Because sociometry is a 
peer rating rather than a rating by superiors, it adds another dimension 
to the understanding of members of a group. 


"Guess-who" Technique 


A process of description closely related to sociometry is the "guess-who" 
technique. Developed by Hartshorne and May (1929), the process consists 
of descriptions of the various roles played by children in a group. 
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Children are asked to name the individuals who fit certain verbal 
descriptions. 


This one is always happy. 

This one is always picking on others. 
This one is always worried. 

This one never likes to do anything. 
This one will always help you. 


Items of this type yield interesting and significant peer judgments 
and are useful in the study of individual roles. Of course, the names of 
children chosen should not be revealed. 


Social-distance Scale 


Another approach to the description and measurement of social relation- 
ships is the social-distance scale, developed by Bogardus (1933). This device 
attempts to measure to what degree an individual or group of individuals 
is accepted or rejected by another individual or group. 

Various scaled situations, with score values ranging from acceptance 
to rejection, are established. The individual checks his or her position by 
choosing one of the points on the scale. For example, in judging acceptance 
of different minority groups, the choices might range between these ex- 
tremes: 


Complete acceptance — I wouldn't object to having a member of this 
group become a member of my family by 


marriage. 

Partial acceptance I wouldn't mind sitting next to a member of 
this group on a bus. 

Rejection I don't think that members of this group 


should be admitted into our country. 


When applied to an individual in a classroom situation, the choices 
might range between these extremes: 


Complete acceptance — l'dliketo have this student as my best friend. 
Partial acceptance I wouldn't mind sitting near this student. 
Rejection I wish this student weren't in my room. 


Of course, in the real social-distance scale, illustrated by the sample 
items above, there would be a larger number of evenly spaced scaled po- 
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sitions (usually seven in number), giving a more precise measure of ac- 
ceptance or rejection. 

Devices of the type described here have many possibilities for the 
description and measurement of social relationships and, in this important 
area of social research, may yield interesting and useful data. 


ORGANIZATION OF DATA COLLECTION 


The following discussion is directed to the beginner and does not suggest 
appropriate procedures for the advanced researcher. Theses, dissertations, 
and advanced research projects usually involve sophisticated experimental 
designs and statistical analysis. The use of the computer has become stand- 
ard procedure. Because it can effectively process complex variable rela- 
tionships, it has made a significant contribution to research. Chapter 10 
discusses computers and their uses for organizing and analyzing data. 

When the results of an observation, interview, questionnaire, opin- 
ionnaire, or test are to be analyzed, problems of organization confront the 
researcher. Even when a computer will be used, the first problem is to 
designate appropriate, logical, and mutually exclusive categories for tab- 
ulation of the data. At times the hypothesis or question to be answered 
may suggest the type of organization. If the hypothesis involved the dif- 
ference between the attitudes of men and women toward teacher merit 
rating, the categories male and female would be clearly indicated. In other 
instances the categories are not determined by the hypothesis, and other 
subdivisions of the group under investigation may be desirable. The re- 
searcher should keep these issues in mind when selecting, or designing, 
the data collection procedure. Proper attention given to this matter of 
organization early in the research process can save a great deal of time at 
the data analysis phase. 

When the responses or characteristics of a group are analyzed, it is 
sometimes satisfactory to describe the group as a whole. In simple types of 
analysis, when the group is sufficiently homogeneous, no breakdown into 
subgroups is necessary. But in many situations the picture of the whole 
group is not clear. The heterogeneity of the group may yield data that 
have little meaning. One tends to get an unreal picture ofa group of subjects 
that are actually very different from one another, and the differences are 
concealed by a description of a nonexistent or unreal average. In such cases 
it may be helpful to divide the group into more homogeneous subgroups 
that have in common some distinctive characteristics that may be significant 
for the purpose of the analysis. Distinguishing between the response of 
men and women, between elementary and secondary teachers, or between 
gifted and average-learning children may reveal significant relationships. 
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i For example, a new type of classroom organization may seem to have 
little impact on a group of students. But after dividing the group into two 
di _ subgroups, the gifted and the average learners, some interesting relation- 
ships may become clear. The grouping may be effective for the bright 
students but most ineffective for the average learners. 

Many studies employ the classification of data into dichotomous, or 
twofold, categories. When the categories are established on the basis of test 
scores, rankings, or some other quantitative measure, it may be advisable 
to compare those at the top with those at the bottom, omitting from the 
analysis those near the middle of the distribution. It is possible to compare 
the top third with the bottom third, or the top 25 percent with the bottom 
25 percent. This eliminates those cases near the midpoint that tend to 
obscure the differences that may exist. Through elimination of the middle 
portion, sharper contrast is achieved, but the risk of the regression effect 
is increased. 

Comparisons are not always dichotomous. At times it is desirable to 
divide a sample into more than two categories, depending on the nature 
of the variables that are to be considered. 


Outside Criteria for Comparison 


In addition to the comparisons that may be made between subgroups within 
the larger group, the group may be analyzed in terms of some outside criteria. 
Of course, it must be assumed that reasonably valid and reliable measuring 
devices are available for making such comparisons. These "measuring sticks" 
may consist of standardized tests, score cards, frequency counts, and phys- 
ical as well as psychological measuring devices. Some of these outside cri- 
teria include the following: 


l. Prevailing conditions, practices, or performance of comparable units. Com- 
parison may be made with other communities, schools, and classes. 
Comparisons may be made with groups that represent best conditions 
or practices or typical or average status, or with equated groups that 
have been matched in terms of certain variables, leaving one variable 
or a limited number of variables for comparison. 

2. What experts believe to constitute best conditions or practices. These experts 
may comprise a panel specially chosen for the purpose. A group of 
practitioners in the field who are assumed to be most familiar with 
the characteristics under consideration, or the survey staff itself, may | 
constitute the body of experts. The judgments of recognized author- 
ities who publish their opinions are frequently selected as criteria. 

3. What a professional group, a commission, an accrediting agency, or another | 
scholarly deliberative body establishes as appropriate standards. These stand- 
ards may be expressed as lists of objectives or may be quantitative 
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measures of status for accreditation or approval. The American Med- 
ical Association's standards for accreditation of medical schools, the 
accreditation standards of the North Central Association of Secondary 
Schools and Colleges, or the standards of the National Council for 

) the Accreditation of Teacher Education for programs of teacher ed- 
ucation are examples of evaluative criteria. 

4. Laus or rules that have been enacted or promulgated by a legislative or quasi- 
legislative body. Teacher certification regulations, school-building 
standards, or health and safety regulations provide appropriate cri- 
teria for comparison. ! 

5. Research evidence. 'The factors to be analyzed may be examined in the 
light of principles confirmed by published scholarly research. 

6. Public opinion. Although not always appropriate as a criterion of what 
should be, the opinions or views of “the man on the street" are some- 
times appropriate as a basis for comparison. 


Sorting and Tabulating Data 


Tabulation is the process of transferring data from the data-gathering in- 
struments to the tabular form in which they may be systematically exam- 
ined. This process may be performed in a number of ways. In simple types 
of research, hand-tabulating procedures are usually employed. In more 
extensive investigations, a card-tabulating process may be used, possibly 
including machine methods. 

Most simple research studies employ the method of hand-sorting and 
recording, with tabulations written on tabulation sheets. To save time and 
ensure greater accuracy, it is recommended that one person read the data 
while the other records them on the tabulation sheet. In constructing tally 
form sheets, it is important to provide enough space to record the tallies 
in each category. 

The following discussion on hand tabulation emphasizes the impor- 
tance of careful planning before the sorting and tabulation begin. Without 
careful planning, inexperienced researchers may waste effort when tabu- 
lating responses on a set of questionnaires filled out by a group of teachers. 
After completing the tabulation, they may decide to compare the responses 
of elementary teachers with secondary teachers. This would involve retab- 
ulating the responses of the questionnaire. It might then occur to them 
that it would be interesting to compare men’s responses to women’s. An- 
other handling of the questionnaires would be necessary. 

If they had decided upon their categories before tabulation, one han- 
dling of the questionnaires would have been sufficient. Sorting the ques- 
tionnaires into two piles, one for elementary teachers and another for 
secondary teachers, then sorting each of these into separate piles for men 
and for women, would have yielded four stacks. Then, through the separate 
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tabulation of each pile, one planned operation would have yielded the same 
amount of information as three unplanned operations. 

Before tabulating questionnaires or opinionnaires, it is always im- 
portant to decide upon the categories that are to be analyzed. If this decision 
is delayed, it may be necessary to retabulate the items a number of times, 
needlessly consuming a great deal of time and effort. 

If the data-gathering device called for a larger number of responses, | 
the system of presorting would be similar. It would be advisable, however, 
to set up a separate tabulation sheet for each of the categories, because a 
single sheet would become unwieldy. For all but the simplest cases, a com- 
puter would handle this easily as long as each variable is coded properly 
(e.g., 1 = male, 2 = female). 

Figure 7—7 illustrates how a three-item opinionnaire response could 
be tabulated for a question such as the following: 


An honor system would eliminate cheating in examinations. 
I agree TON. 
I don't know — — 
I disagree . 


Students may apply these procedures to classify and tabulate similar 
types of data. These data sheets are not ordinarily presented in the report, 
but they may suggest ways in which some of the data may be presented as 
tables or graphic figures. Relatively simple computer programs are available 
that can handle much of this sorting and tabulating if the data are properly 
organized initially (see Chapter 10). 


Tables and Figures 


The process of tabulation which has just been described is the first step in 
the construction of the tables for a research report. It is likely that the 
beginning researcher thinks of tables purely as aids to understanding. Dis- 


DON'T KNOW DISAGREE 


E 
R77" NE NNI 
[sewer | — | 
rw d Bie 
[sme | — ] 


FIGURE 7-7 Tabulation form providing for the analysis of 12 possible response categories based upon 
question 1 on an opinionnaire. 
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playing data in rows and columns according to some logical plan of clas- 
sification may serve an even more important purpose in helping researchers 
to see the similarities and relationships of their data in bold relief. 

A discussion of the construction and use of tables and figures is pre- 
sented in some detail in Chapter 11. 


Percentage Comparisons 


Presenting data by frequency counts has a number of limitations. If the 
groups to be compared are unequal in size, the frequency count may have 
little meaning. Converting to percentage responses enables the researcher 
to compare subgroups of unequal size meaningfully. Translating frequency 
counts into percentages indicates the number-per-hundred compared. The 
provision of a common base makes the comparison clear. 

Several limitations should be recognized in using percentage com- 
parisons. Unless the number of frequencies is reasonably large, a per- 
centage may be misleading and may seem to suggest an unwarranted gen- 
eralization. It may be appropriate to indicate that, of four physicians 
interviewed, one believed that a particular medication would be harmful. 
To indicate that 25 percent of physicians interviewed believed that the 
medication would be harmful creates an image of a larger sample of phy- 
sicians than was actually interviewed. It is essential that both frequency 
counts and percentage responses be included in the presentation and anal- 
ysis of data. 

In converting frequency counts to percentages, rounding to the near- 
est percentage point is preferable. Because the type of data presented in 
educational research is not very precise, there is little value in expressing 
percentages in decimal values. In other situations, however, such as the 
drug industry, where ratio scales of measurement are often used, it would 
be extremely important to carry a percentage reported to four or five 
decimal places, particularly when a trace of an element would be harmful 
if exceeded. ^ 

When using percentages in dichotomous comparisons, it is necessary 
to state the percentage in only one of the categories. If 65 percent of the 
respondents are men, it is not necessary to indicate that 35 percent are 
women. Unnecessary duplication is evidence of poor reporting. 


Crossbreaks 


A crossbreak table is a way of presenting observations; it is a useful device 
for organizing and describing a data relationship. An example of an opin- 
ionnaire response is Figure 7—8. The topic presented on the opinionnaire 
was: "A legal abortion, during the first trimester, should be the right of 


any woman." 
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Numbers are expressed frequencies of response 
Numbers in parentheses are expressed as percentages 


FIGURE 7-8 Crossbreak tabulation of attitudes regarding abortion. 


Ranking and Weighting items 


There are times when response categories are not mutually exclusive. Pref- 
erences for certain things or reasons for an act are usually explained in 
terms of a number of factors, rarely single ones. It would be unrealistic to 
expect respondents to indicate their favorite type of recreation or the single 
reason they decided to attend a particular university. In such instances it 
would be appropriate to ask the respondents to indicate two or three re- 
sponses in order of importance or preference. This ranking of items makes 
possible a useful method of analysis. Items may be weighted in inverse 
order. For example, if three items are to be ranked, it is appropriate to 
assign weightings as follows: 


Ist choice — 3 points 
2nd choice 2 points 
3rd choice I point 


A composite judgment of the importance of the items could be de- 
termined by the weighted totals or averages for all the respondents. 

Remember that when items are ranked in order, the differences be- 
tween ranked items may not be equal. Ranking is not the most refined 
method of scaling. 


LIMITATIONS AND SOURCES OF ERROR 


A number of limitations and sources of error in the analysis and interpre- 
tation of data can jeopardize the success of an investigation. New research- 
ers in particular need to be aware of these potential pitfalls. Some of these 
problems include: 


l. Confusing statements with facts. A common fault is the acceptance 
of statements as facts. What individuals report may be a sincere 
expression of what they believe to be the facts in a case, but these 


SUMMARY 


Methods and Tools of Research 215 


statements are not necessarily true. Few people observe skillfully, and 
many forget quickly. It is the researcher's responsibility to verify all 
statements as completely as possible before they are accepted as facts. 

2. Failure to recognize limitations. The very nature of research implies 
certain restrictions or limitations about the group or the situation 
described— its size, its representativeness, and its distinctive compo- 
sition. Failure to recognize these limitations may lead to the formu- 
lation of generalizations that are not warranted by the data collected. 

3. Careless or incompetent tabulation. When one is confronted with a 
mass of data, it is easy to make simple mechanical errors. Placing a 
tally in the wrong cell or incorrectly totaling a set of scores can easily 
invalidate carefully gathered data. Errors sometimes may be attrib- 
uted to clerical helpers with limited ability and little interest in the 
research project. 

4, Faulty logic. This rather inclusive category embraces a number of 
errors in the thought processes of the researcher. Invalid assumptions, 
inappropriate analogies, inversion of cause and effect, confusion of 
a simple relationship with causation, failure to recognize that group 
phenomena may not be used indiscriminately to predict individual 
occurrences or behavior, failure to realize that the whole may be 
greater than the sum of its parts, belief that frequency of appearance 
is always a measure of importance, and many other errors are limi- 
tations to accurate interpretation. 

5. The researcher's unconscious bias. Although objectivity is the ideal 
of research, few individuals achieve it completely. There is great temp- 
tation to omit evidence unfavorable to the hypothesis and to over- 
emphasize favorable data. Effective researchers are aware of their 
feelings and the likely areas of their bias and constantly endeavor to 
maintain the objectivity that is essential. 


The researcher chooses the most appropriate instruments and procedures 
that provide for the collection and analysis of data upon which hypotheses 
may be tested. The data-gathering devices that have proven useful in ed- 
ucational research include psychological tests and inventories, question- 
naires, opinionnaires, Q methodology, observation, checklists, rating scales, 
score cards, scaled specimens, document or content analyses, interviews, 
sociograms, “guess-who” techniques, and social-distance scales. 

Some research investigations use but one of these devices. Others 
employ a number of them in combination. Students of educational research 
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EXERCISES 


should make an effort to familiarize themselves with the strengths and 
limitations of these tools and should attempt to develop skill in constructing 
and using them effectively. : 

The analysis and interpretation of data represent the application of 
deductive and inductive logic to the research. process. The data are often 
classified by division into subgroups and then analyzed and synthesized in 
such a way that. hypotheses may be verified or rejected. The final result 
may be a new principle or generalization. Data are examined in terms of 
comparisons between the more homogeneous segments within the whole 
group and by comparison with some outside criteria. 

The processes of classification, sorting, and tabulation of data are 
important parts of the research process. In extensive studies, mechanical 
and/or computer methods of sorting and tabulating are used to save time 
and effort and to minimize error. In smaller projects, hand-sorting and 
hand-tabulating processes are still often employed. 

The researcher must guard against the limitations and sources of error 
inherent in the processes of analysis and interpretation of data. 


1. For what type of problem and under what circumstances would you find the 
following data-gathering techniques most appropriate: 
a. Likert scale 
b. Questionnaire 
c. Interview 
d. Observation 
e. Q-sort 
2. Construct a short questionnaire that could be administered in class. The fol- 
lowing topics are suggested: 
a. Leisure Interests and Activities 
b. Reasons for Selecting Teaching as a Profession 
c. Methods of Dealing with School Discipline 
d. Political Interests and Activities 
3. Construct a Likert-type opinionnaire dealing with a controversial problem. One 
of the following topics may be appropriate: 
a. Teacher Affiliation with Professional Organizations 
b. Teacher Strikes and Sanctions 
c. Religious Activities in the School Program 
d. The Nongraded School 
4. Construct a short rating scale to be used for the evaluation of the teaching 
performarice of a probationary teacher. 
5. To what extent is the administration of personal and social adjustment inven- 
tories an invasion of a student's privacy? 
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DESCRIPTIVE 
DATA ANALYSIS 


Because this textbook concentrates on educational research, the following 
discussion of statistical analysis is in no sense complete or exhaustive. Only 
some of the most simple and basic concepts are presented. Students whose 
mathematical experience includes high school algebra should be able to 
understand the logic and the computational processes involved and should 
be able to follow the examples without difficulty. 

The purpose of this discussion is threefold: 


1. To help the student, as a consumer, develop an understanding of 
statistical terminology and the concepts necessary to read with under- 
standing some of the professional literature in educational research. 

2. To help the student develop enough competence and know-how to 
carry on research studies using simple types of analysis. 

3. To prepare the student for more advanced coursework in statistics. 


The emphasis is upon intuitive understanding and practical appli- 
| cation rather than on the derivation of mathematical formulas. Those who 
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expect and need to develop real competence in educational research will 
have to take some of the following steps: 


1. Take one or more courses in behavioral statistics and experimental 
design. 

2. Study more specialized textbooks in statistics, particularly those deal- 
ing with statistical inference (e.g., Ferguson, 1981; Glass & Hopkins, 
1984; Guilford & Fruchter, 1978; Hays, 1981; Kirk, 1982; Siegel, 
1956; Winer, 1971). 

3. Read research studies in professional journals extensively and criti- 
cally. 

4. Carry on research studies involving some serious use of statistical 
procedures. 


WHAT IS STATISTICS? 


Statistics is a body of mathematical techniques or processes for gathering, 
organizing, analyzing, and interpreting numerical data. Because most re- 
search yields such quantitative data, statistics is a basic tool of measurement, 
evaluation, and research. 

The word statistics is sometimes used to describe the numerical data 
that are gathered. Statistical data describe group behavior or group char- 
acteristics abstracted from a number of individual observations that are 
combined to make generalizations possible. 

Everyone is familiar with such expressions as "the average family 
income," "the typical white-collar worker," or "the representative city." 
These are statistical concepts and, as group characteristics, may be ex- 
pressed in measurement of age, size, or any other traits that can be de- 
scribed quantitatively. When we say that "the average fifth-grade boy is ten 
years old,” we are generalizing about all fifth-grade boys, not any particular 

a boy. Thus the statistical measurement is an abstraction that may be used 
in place of a great mass of individual measures. 

The research worker who uses statistics is concerned with more than 
the manipulation of data. The statistical method serves the fundamental 
purposes of description and analysis, and its proper application involves 
answering the following questions: 


1. What facts need to be gathered to provide the information necessary 
to answer the question or to test the hypothesis? 


2. How are these data to be selected, gathered, organized, and analyzed? 
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3. What assumptions underlie the statistical methodology to be em- 
ployed? 
4. What conclusions can be validly drawn from the analysis of the data? 


Research consists of systematic observation and description of the 
characteristics or properties of objects or events for the purpose of dis- 
covering relationships between variables. The ultimate purpose is to de- 
velop generalizations that may be used to explain phenomena and to predict 
future occurrences. To conduct research, we must establish principles so 
that the observation and description have a commonly understood mean- 
ing. Measurement is the most precise and universally accepted process of 
description, assigning quantitative values to the properties of objects and 
events. 


PARAMETRIC AND NONPARAMETRIC DATA 
In the application of statistical treatments, two types of data are recognized. 


1. Parametric data. Data of this type are measured data, and parametric 
statistical tests assume that the data are normally or nearly normally 
distributed. Parametric tests are applied to both interval- and ratio- 
scaled data. 

2. Nonparametric data. Data of this type are either counted or ranked. 
Nonparametric tests, sometimes known as distribution-free tests, do 
not rest upon the more stringent assumption of normally distributed 
populations. 


Table 8—1 presents a graphic summary of the levels of quantitative 
description and the types of statistical analysis appropriate for each level. 
These concepts will be developed later in the discussion. 

However, the reader should be aware that many of the parametric 
statistics (I test, analysis of variance, and Pearson's r in particular) are still 
appropriate even when the assumption of normality is violated. This ro- 
bustness has been demonstrated for the ¢ test, analysis of variance, and, to 
a lesser extent, analysis of covariance by a number of researchers including 
Glass, Peckham, and Sanders (1972), Lunney (1970), and Mandeville (1972). 
Thus, with ordinal data and even with dichotomous data (two choices such 
as Pass-Fail), these statistical procedures, which were designed for use with 
interval and ratio data, may be appropriate and useful. Pearson's r, which 
can also be used with any type of data, will be discussed later in this chapter. 
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TABLE 8-1 Levels of Quantitative Description’ 


SOME 
DATA APPROPRIATE 
LEVEL SCALE PROCESS TREATMENT TESTS 
measured equal 
intervals 
fe des true zero t test 
ratio relationship analysis of variance 
analysis of covariance 
parameta factor analysis 
measured equal Pearson's r 
3 Interval intervals 
no true zero 
Spearman's rho (p) 
2 Ordinal ranked in order Mann-Whitney 
Wilcoxon 
nonparametric 
: chi square 
1 Nominal spado median 
sign 


*Refer to Chapter 7 for a discussion of the four levels of measurements. 


DESCRIPTIVE AND INFERENTIAL ANALYSIS 


Up until now we have not discussed the limits to which statistical analysis 
may be generalized. Two types of statistical application are relevant. 


Descriptive analysis. Descriptive statistical analysis limits generaliza- 
tion to the particular group of individuals observed. No conclusions are 
extended beyond this group, and any similarity to those outside the group 
cannot be assumed. The data describe one group and that group only. 
Much simple action research involves descriptive analysis and provides 
valuable information about the nature of a particular group of individuals. 


Inferential analysis. Inferential statistical analysis always involves the 
process of sampling and the selection of a small group that is assumed to 
be related to the population from which it is drawn. The small group is 
known as the sample, and the large group is the population. Drawing con- 
clusions about populations based upon observations of samples is the pur- 
pose of inferential analysis. 

A statistic is a measure based on observations of the characteristics of 


d came 


Descriptive Data Analysis 223 


a sample. A statistic computed from a sample may be used to estimate a 
parameter, the corresponding value in the population from which the sample 
is selected. Statistics are usually represented by letters of our Roman al- 
phabet such as X, S, and r. Parameters, on the other hand, are usually 
represented by letters of the Greek alphabet such as c, and p. 

Before any assumptions can be made, it is essential that the individuals 
selected be chosen in such a way that the small group, or sample, approx- 
imates the larger group, or population. Within a margin of error, which 
is always present, and by the use of appropriate statistical techniques, this 
approximation can be assumed, making possible the estimation of popu- 
lation characteristics by an analysis of the characteristics of the sample. 

It should be emphasized that when data are derived from a group 
without careful sampling procedures, the researcher should carefully state 
that findings apply only to the group observed and may not apply to or 
describe other individuals or groups. The statistical theory of sampling is 
complex and involves the estimation of error of inferred measurements, 
error that is inherent in estimating the relationship between a random 
sample and the population from which it is drawn. Inferential data analysis 
is presented in Chapter 9. ] 


THE ORGANIZATION OF DATA 


The list of test scores in a teacher's grade-book provides an example of 
unorganized data. Because the usual method of listing is alphabetical, the 
scores are difficult to interpret without some other type of organization. 


Alberts, James 60 
Brown, John 78 


Davis, Mary 90 
Smith, Helen 70 
Williams, Paul 88 


The array. Arranging the same scores in descending order of mag- 
nitude produces what is known as an array. 


90 
88 
78 
70 
60 
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TABLE 8-2 Scores of 37 Students 


87 82 
87 80 


on a Semester Algebra 
Test 
98 85 80 76 67 
97 85 80 76 67 
95 85 80 75 64 
93 84 80 73 60 
90 82 78 72 57 
88 82 78 70 
78 
77 


Range = 98 — 57 = 41 + 1 = 42 


The array provides a more convenient arrangement. The highest 
score (90), the lowest score (60), and the middle score (78) are easily iden- 
tified. Thus the range (the difference between the highest and lowest scores, 
plus one) can easily be determined. 

Illustrated in Table 8—2 is an ungrouped data arrangement in array 
form. 


Grouped Data Distributions 


Data are often more clearly presented when scores are grouped and a 
frequency column is included. Data can be presented in frequency tables 
(see Table 8—3) with different class intervals, depending on the number 
and range of the scores. 

A score interval with an odd number of units may be preferable 
because its midpoint is a whole number rather than a fraction. Because all 
scores are assumed to fall at the midpoint (for purposes of computing the 
mean) the computation is less complicated: 


TABLE 8-3 Scores on Algebra Test Grouped in Intervals of Five 


SCORE INTERVAL TALLIES FREQUENCY (f) | INCLUDES 
96-100 11 2 (96 97 98 99 100) 
91-95 11 2 (91 92 93 94 95) 
86-90 1111 4 etc. 

81-85 4711 7 

76-80 Let um 11 

71-75 111 3 

66-70 rrr 5 

61-65 1 1 

56-60 11 2 


z 
[i 
e 
* 
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Even interval of four: 8 9 10 11 (midpoint 9.5) 
Odd interval of five: 8 9 10 11 12 (midpoint 10) 


There is no rule that rigidly determines the proper score interval, 
and intervals of ten are frequently used. 


STATISTICAL MEASURES 


Several basic types of statistical measures are appropriate in describing and 
analyzing data in a meaningful way. 


Measures of central tendency or averages 
Mean 
Median 
Mode 

Measures of spread or dispersion 
Range 
Variance 
Standard deviation 

Measures of relative position 
Percentile rank 
Percentile score 
Standard scores 

Measures of relationship 
Coefficient of correlation 


Measures of Central Tendency 


Nonstatisticians use averages to describe the characteristics of groups in a 
general way. The climate of an area is often noted by average temperature 
or average amount of rainfall. We may describe students by grade-point 
averages or by average age. Socioeconomic status of groups is indicated by 
average income, and the return on an investment portfolio may be judged 
in terms of average income return. But to the statistician, the term average 
is unsatisfactory, for there are a number of types of averages, only one of 
which may be appropriate to use in describing given characteristics of a 
group. Of the many averages that may be used, three have been selected 
as most useful in educational research: the mean, the median, and the 


mode. 


The mean (X). The mean of a distribution is commonly understood 
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as the arithmetic average. The term grade-point average, familiar to students, 
is a mean value. It is computed by dividing the sum of all the scores by the 
number of scores. In formula form: 


S! X 
diim 
where X = mean 
E = sum of 
X = scores in a distribution 
N = number of scores 
EXAMPLE 
X 
6 
5 
4 
3 
2 
zl 
=X = 21 
N= 6 
X = 21/6 = 3.50 


The mean is probably the most useful of all statistical measures, for, 
in addition to the information that it provides, it is the base from which 
many other important measures are computed. 


The median (Md). The median is a point (not necessarily a score) in 
an array, above and below which one-half of the scores fall. It is a measure 
of position rather than of magnitude and is frequently found by inspection 
rather than by calculation. When there are an odd number of untied scores, 
the median is the middle score, as in the example below. 


3 scores above 


3 scores below 
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When there are an even number of untied scores, the median is the 
midpoint between the two middle scores, as in the example below. 


6 
5 3 scores above 
4 

—median = 3.50 
3 
2 3 scores below 
1 


If the data includes tied scores at the median point, interpolation 
within the tied scores will be necessary. Each integer would be thought of 
as representing the interval from halfway between it and the next lower 
score to halfway between it and the next higher score. When ties occur at 
the midpoint of a set of scores, we portion out this interval into the number 
of tied scores and find the midpoint or median. Consider the set of scores 
in Figure 8-1. 

Because there are four scores tied, we divide the interval from 74.5 
to 75.5 into four equal parts. Each of the scores is then considered to occupy 
0.25 of the interval, and the median is calculated. 

One purpose of the mean and the median is to represent the “typical” 
score; most of the time we are satisfied to use the mean for this purpose. 
However, when the distribution of scores is such that most scores are at 
one end and relatively few are at the other (known as a skewed distribution), 
the median is preferable because it is not influenced by extreme scores at 


FIGURE 8-1 Median calculation. 


70 
73 
74 
74.50 <—— lower limit 
0.25 | 75 
74.75 44—— median 
0.25 | 75 
75.00 
0.25 | 75 
75.25 
0.25 | 75 


75.50 <—— upper limit 


228 


Descriptive Data Analysis 


either end of the distribution. In the following examples, the medians are 
identical. However, the mean of Group A is 4 and the mean of Group B 
is 10. The mean and median are both representative of Group A, but the 
median better represents the "typical" score of Group B. 


GROUP A GROUP B 
7 50 


Thus in skewed data distributions, the median is a more realistic 
measure of central tendency than the mean. 
. Ina small school with five faculty members, the salaries might be: 


Teacher A. $36,000 
B 22,000 
C 21,400 Md 
D 21,000 
E 19,600 
$120,000 


120,000 


X= 5 


= 24,000 


The average salary of the group is represented with a different em- 


phasis by the median salary ($21,400) than by the mean salary ($24,000), 


which is substantially higher than that of four of the five faculty members. 
Thus we see again that the median is less sensitive than the mean to extreme 
values at either end of a distribution. = 


The mode (Mo). 


| Mode 


— NO 4o OY OD 
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The mode is the score that occurs most frequently in a distribution. 
It is located by inspection rather than by computation. In grouped data 
distributions, the mode is assumed to be the midscore of the interval in 
which the greatest frequency occurs. 

For example, if the modal age of fifth-grade children is ten years, it 
follows that there are more ten-year-old fifth-graders than any other age. 
Or a menswear salesman might verify the fact that there are more sales of 
size 40 suits than of any other size; consequently, a larger number of size 
40 suits are ordered and stocked, size 40 being the mode. 

In some distributions there may be more than one mode. A two-mode 
distribution is bimodal, more than two, multimodal. If the number of auto 
accidents on the streets of a city were tabulated by hours of occurrence, it 
is likely that two modal periods would become apparent— between 7 and 
8 A.M. and between 5 and 6 P.M., the hours when traffic to and from stores 
and offices is heaviest and when drivers are in the greatest hurry. In a 
normal distribution.of data there is one mode, and it falls at the midpoint, 
just as the mean and median do. In some unusual distributions, however, 
the mode may fall at some other point. When the mode or modes reveal 
such unusual behavior, they do not serve as measures of central tendency, 
but they do reveal useful information about the nature of the distribution. 


Measures of Spread or Dispersion 


Measures of central tendency describe location along an ordered scale. 
There are characteristics of data distributions that call for additional types 
of statistical analysis. The scores in Table 8-4 were made by a group of 
students on two different tests, one in reading and one in arithmetic. 

The mean and the median are identical for both tests. It is apparent 
that averages do not fully describe the differences in achievement between 
students' scores on the two tests. To contrast their performance, it is nec- 
essary to use a measure of score spread or dispersion. The arithmetic test 
scores are homogenous, with little difference between adjacent scores. The 
reading test scores are decidedly heterogeneous, with performances rang- 
ing from superior to very poor. 


The range. The range, the simplest measure of dispersion, is the dif- 
ference between the highest and lowest scores plus one. For reading scores, 
the range is 41 (95 — 55 + 1). For arithmetic scores, the range is 5 (79 — 
Met y 


The deviation from the mean (x). A score expressed as its distance from 
the mean is called a deviation score. Its formula is: 
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TABLE 8-4 Sample Data 


READING ARITHMETIC 
Academic Academic 
Pupil Score Grade Score Grade 
Arthur 95 A 76 [9 
Betty 90 78 
John 85 B 77 c 
Katherine 80 B 71 c 
Charles 75 c 75 c 
Larry 70 [6 79 Cc 
Donna 65 D 73 c 
Edward 60 D 72 [9] 
Mary 55 F 74 c 
=X = 675 zx e 
S =9 
J be y S5 ogg 
X= Ir -75 NORIS 
Md = 75 MAIS 


If the score falls above the mean the deviation score is positive (+), if it 
falls below the mean the deviation.score is negative (—). 
Using the same example, compare two sets of scores: 


READING ARITHMETIC 
x (cx) X (X-X) 
95 +20 76 +1 
90 +15 78 +3 
85 +10 77 +2 
80 +5 71 -4 
75 0 75 0 
70 -5 79 +4 
65 -10 73 -2 
60 -15 72 -3 
-55 =20 mi A 

=X = 675 Xx = =X = 675 Xx =0 

N=9 N=9 

X 2 15 X - 75 


It is interesting to note that the sum of the score deviations from the 
mean equals zero. 


D(X — X)=0 
=x =0 


In fact, we can give an alternative definition of the mean: The mean 
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is that value ina distribution around which the sum of the deviation score 
equals zero. 


The variance (a?). The sum of the squared deviations from the mean, 
divided by N, is known as the variance. We have noted that the sum of the 
deviations from the mean equals zero (3x = 0). From a mathematical poiht 
of view it would be impossible to find a mean value to describe these 
deviations (unless the signs were ignored). Squaring each deviation score 
yields a positive score. They can then be summed, divided by N, and the 
mean of the squared deviations computed. The variance formula is: 

XX-X? xe 
o? = N OEN 

Thus the variance is a value that describes how all of the scores in a 
distribution are dispersed or spread about the mean. This value is very 
useful in describing the characteristics of a distribution and will be em- 
ployed in a number of very important statistical tests. However, since all 
of the deviations from the mean have been squared to find the variance, 
it is much too large to represerit the spread of scores. 


The standard deviation (c). The standard deviation, the square root of 
the variance, is most frequently used as a measure of spread of dispersion 
of scores in a distribution. The formula for standard deviation is: 


S / 
puo LU iue E ; 


In the following example using the reading scores from Table 8—4, the 
variance and the standard deviation are computed. 


X x M 


95 +20 +400 
90 +15 +225 
85 +10 +100 
80 +5 + 25 
75 0 0 
70 -5 + 25 
65 -10 +100 
60 -15 +225 
55 -25 +400 

Xx? = 1500 


variance o? = 1500/9 = 166.67 
standard deviation o = \/1500/9 = 3/166.67 = 12.91 
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As can clearly be'seen, a variance of 166.67 cannot represent, for most 
purposes, a spread of scores with a total range of only 41, but the standard 
deviation of 12.91 does make sense. 

‘Although the deviation approach (just used in the previous calcula- 
tion) provides a clear example of the meaning of variance and standard 
deviation, in actual practice the deviation method can be awkward to use 

si in computing the variances or standard deviations for a large number of 
scores. A less complicated method, which results in the same answer, uses 
the raw scores instead of the deviation scores. The number values tend to 
be large, but the use of a calculator facilitates the computation. 


NXX? — (2X)? 
N2 


»u 2 
standard deviation o = | ee 


The following example demonstrates the process of computation, 
using the raw score method: 


variance o? = 


x x 
95 9025 
90 8100 
85 7225 
80 6400 
75 5625 
70 4900 
65 4225 
60 3600 
55 3025 
=X = 675 LX? = 52,125 
N=9 | 
delit 9(52,125) — (675)? .. 469,125 — 455,624 | 
9(9) N 81 
P ES - 166.67 
c = V166.67 = 12.91 | 


The standard deviation is a very useful device for comparing char- 
acteristics that may be quite different or may be expressed in different 
units of measurement. The discussion that follows shows that when the 
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normality of distributions can be assumed, it is possible to compare the 
proverbial apples and oranges. The standard deviation is independent of 
the magnitude of the mean and provides a common unit of measurement. 
To use a rather farfetched example, imagine a man whose height is one 
standard deviation below the mean and whose weight is one standard de- 
viation above the mean. Because we assume that there is a normal rela- 
tionship between height and weight (or that both characteristics are nor- 
mally distributed), we have a picture of a short, overweight individual. His 
height, expressed in inches, is in the lowest 16 percent of the population, 
and his weight, expressed in pounds, is in the highest 16 percent. 

This concept is developed later, but before we discuss using the stand- 
ard deviation to describe status or position in a group, we need to examine 
the normal distribution. 


NORMAL DISTRIBUTION 


The earliest mathematical analysis of the theory of probability dates to the 
eighteenth century. Abraham DeMoivre, a French mathematician, discov- 
ered that a mathematical relationship explained the probabilities associated 
with various games of chance. He developed the equation and the graphic 
pattern that describes it. During the nineteenth century, a French astron- 
omer, LaPlace, and a German mathematician, Gauss, independently arrived 
at the same principle and applied it more broadly to areas of measurement 
in the physical sciences. From the limited applications made by these early 
mathematicians and astronomers, the theory of probability, or the curve 
of distribution of error, has been applied to data gathered in the areas of 
biology, psychology, sociology, and other sciences. The theory describes 
the fluctuations of chance errors of observation and measurement, It is 
necessary to understand the theory of probability and the nature of the 
curve of normal distribution in order to comprehend many important 
statistical concepts, particularly in the area of standard scores, the theory 
of sampling, and inferential statistics. 

The law of probability and the normal curve that illustrates it are 
based upon the law of chance or the probable occurrence of certain events. 
When any body of observations conforms to this mathematical form, it can 
be represented by a bell-shaped curve with definite characteristics (see 


Figure 8-2). 


1. The curve is symmetrical around its vertical axis. 
2. The terms cluster around the center (the median). 
3. The mean, median, and the mode of the distribution have the same 


value. 
4. The curve has no boundaries in either direction, for the curve never 
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FIGURE 8-2 The normal curve. 


touches the base line, no matter how far it is extended. The curve is 
a curve of probability, not of certainty. 


The operation of chance prevails in the tossing of coins or dice. It is 
believed that many human characteristics respond to the influence of chance. 
For example, if certain limits of age, race, and gender were kept constant, 
such measures as height, weight, intelligence, and longevity would ap- 
proximate the normal distribution pattern. But the normal distribution 
does not appear in data based upon observations of samples. There just 
are not enough observations. The normal distribution is based upon an 
infinite number of observations beyond the capability of any observer; thus 
there is usually some observed deviation from the symmetrical pattern. But 
for purposes of statistical analysis, it is assumed that many characteristics 
do conform to this mathematical form within certain limits, providing a 
convenient reference. ` 

The concept of measured intelligence is based upon the assumption 
that intelligence is normally distributed throughout limited segments of 
the population. Tests are so constructed (standardized) that scores are 
normally distributed in the large group that is used for the determination 
of norms or standards. Insurance companies determine their premium 
rates by the application of the curve of probability. Basing their expectation 
on observations of past experience, they can estimate the probabilities of 
survival of a man from age 45 to 46. They do not purport to predict the 
survival of a particular individual, but from a large group they can predict 
the mortality rate of all insured risks. 

The total area under the normal curve may be considered to approach 
100 percent probability. Interpreted in terms of standard deviations, areas 
between the mean and the various standard deviations from the mean 
under the curve show these percentage relationships (Figure 8—3). 
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FIGURE 8-3 Percentage of frequencies in a normal distribution falling ‘within a range of a given number 


mea Ot ne deviations from the mean: 
Ha m m > no 


r 


Note: the graphic conformation of the EER EIA a the normal 


curve: " " 


It is symmetrical-—the perceniage of frequencies is the same for equal 
intervals below or above the mean. 


The terms or scores “cluster” or “crowd around the mean"— note 
how the percentages in a given standard deviation are greatest around 
the mean and decrease as one moves away from the mean. 


X to x 1.00z 34.13% 

+1.00to +2.00z 13.59% 

+2.00 to +3.00z 2.15% 

The curve is highest at the mean—the mean, median, and mode have 
the same value. 


The curve has no boundaries—a small fraction of 1 percent of the 
space falls outside of + 3.00 standard deviations from the mean. 


The normal curve is a curve that also describes probabilities. For 


example, if height is normally distributed for a given segment of the pop- 
ulation, the chances are “io Fæ thata person selected at random will be between 


the mean and one standard deviation above the mean in height, and To 
that the person selected wi!l be between the mean and one standard de- 


viation below the mean in height—or So that the selected person will be 
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within one standard deviation (above or below) the mean in height. Another 
interpretation is that 68.26 percent of this population segment will be 
between the mean and one standard deviation above or below the mean 
in height. 

An example may help the reader understand this concept. IQ (intel- 
ligence quotient) is assumed to be normally distributed. The Wechsler 
Intelligence Scale for Children — Revised (WISC-R) has a mean of 100 and 
a standard deviation of 15. Thus, a WISC-R IQ score that is one standard 
deviation above the mean is 115, and a score of 85 is one standard deviation 
below the mean. From this information we know that approximately 68 
percent of the population should have WISC-R scores between 85 and 115. 

For practical purposes the curve is usually extended to +3 standard 
deviations from the mean (532). Most-events or occurrences (or proba- 
bilities) will fall between these limits. The probability is ^i that these limits 
account for observed or predicted occurrences. This statement does not 
suggest that events or measures could not fall more than three standard 
deviations from the mean but that the likelihood would be too small to 
consider when making predictions or estimates based upon probability. 
Statisticians deal with probabilities, not certainty, and there is always a 
degree of reservation in making any prediction. Statisticians deal with the 
probabilities that cover the normal course of events, not the events that 
are outside the normal range of experience. 


Nonnormal Distributions 


As mentioned earlier in our discussions of parametric and nonparametric 
data and the relative usefulness of the mean and median, not all distri- 
butions, particularly of sample data, are identical to or even close to a 
normal curve. There are two other types of distributions that can occur: 
skewed and bimodal. With skewed distributions, the majority of scores are 
near the high or low end of the range, with relatively few scores at the 
other end. The distribution is considered skewed in the direction of the 
tail (fewest scores). In Figure 8—4, distribution A is skewed positively and 


FIGURE 8-4 Nonnormal distributions. 
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distribution B is skewed negatively. Skewed distributions can be caused by 
a number of factors, including a test that is too easy or hard, or an atypical 
sample (very bright or very low intelligence). 

Bimodal distributions have two niodes (see distribution C in Figure 
8—4) rather than the single mode of normal or skewed distributions. This 
often results from a sample that consists of persons from two populations. 
For instance, the height of American adults would be bimodally distributed, 
females clustering around a mode of about 5 feet 4 inches, and males 
around a mode of about 5 feet 10 inches. 


Interpreting the Norma! Probability Distribution 


When scores are normally or near normally distributed, a normal proba- 
bility table is useful. The values. presented in the normal probability table 
in Appendix B are critical because they provide data for normal distri- 
butions that may be interpreted in the following ways: 


1. The percentage of total space included between the mean and a given 
sigma distance (z) from the mean _ 

2. The percentage of cases, or the number when N is known, that fall 
between the mean and a given sigma distance (z) from the mean 


3. The probability that an event will occur between the mean and a given 
sigmasdistance (z) from the mean 


z = number of standard deviations from the mean 
X-X 
g 


zs 


Figure 8-5 demonstrates how the area under the normal curve can be 
divided. j 


FIGURE 8-5 The space included under the normal curve between the mean and + 1.002. 
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In a normal distribution the following characteristics hold true: 


l. The space included between the mean and -1.00z is .3413 of the 
total area under the curve. 

2. The percentage of cases that fall between the mean and + 1.00z is 
3413; 

3. The probability of an event occurring (observation) between the mean 
and. + 1.00z is .3413. 

4. The distribution is divided into two equal parts; one half above the 
mean and the other half below the mean. 

5. Because one half of the curve is above the mean and .3413 of the 


total area is between the mean and + 1.002, the area of the curve that 
is above + 1.00z is .1587. 


Because the normal probability curve is symmetrical, the shape of the 
right side (above the mean) is identical to the shape of the left side (below 
the mean). As the values for each side of the curve are identical, only one 
set of values is presented in the probability table, expressed to one-hun- 
dredth of a sigma (standard deviation) unit. ` 

The normal probability table in Appendix B provides the proportion 
of the curve that is between the mean and a given sigma (z) value. The 
remainder of that half of the curve is beyond the sigma value. 


PROBABILITY 
above the mean -5000 50/100 
below the mean .5000 50/100 


above + 1.96z -5000 — .4750 = .0250 2.5/100 
below + .32z 5000 + .1255 = .6255 62.5/100 
below — .32z .5000 — .1255 = .3745 37.5/100 


Practical Applications of the Normal Curve 


In the field of educational research the normal curve has a number of 
practical applications: 


l. To calculate the percentile rank of scores in a normal distribution. 

2. To normalize a frequency distribution, an important process in stan- 
dardizing a psychological test or inventory. 

3. To test the significance of observed measures in experiments, relating 
them to the chance fluctuations or errors that are inherent in the 


process of sampling and generalizing about populations from which 
the samples are drawn. 
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MEASURES OF RELATIVE POSITION: STANDARD SCORES 


Standard scores provide a method of expressing any score in a distribution 
in terms of its distance from the mean in standard deviation units. The 
utility of this conversion of a raw score to a standard score will become 
clear as each type is introduced and illustrated. Three types of standard 
Scores are considered. 


l. Sigma Score (z) 
2. T Score (T) 
3. College Board Score (Zas) 


Remember that the distribution is assumed to be normal when using any 
type of standard score. 


The Sigma Score (z). In describing a score ın a distribution, its devia- 
tion from the mean—expressed in standard deviation units—is often more 
meaningful than the score itself. The unit of measurement is the standard 


deviation. 
X-X x 
z= or — 
c o 
where X = raw score 
X = mean 
9 = standard deviation 
x — (X — X) score deviation from the mean 
EXAMPLE A EXAMPLE B 
X-76 X=67 
X = 82 X= 62 
c= o= 
76-82 -6 67-62 5 
co ay aor a — 1.50 Zee ie. =z = +1.00 


The raw score of 76 in Example A may be expressed as a sigma score of 
— 1.50, indicating that 76 is 1.5 standard deviations below the mean. The 
score of 67 in Example B may be expressed as a sigma score of + 1.00, 
indicating that 67 is one standard deviation above the mean. 

In comparing or averaging scores on distributions where total points 
may differ, the use of raw scores may create a false impression of a basis 
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for comparison. A sigma score (z) makes possible a realistic comparison of 
scores and may provide a basis for equal weighting of the scores. On the 
sigma scale, the mean of any distribution is converted to zero and the 
standard deviation is equal tod. 

For example, a teacher wishes to determine a student's equally weighted 
average (mean) achievement on an algebra test and on an English test. 


HIGHEST STANDARD 
SUBJECT TESTSCORE MEAN POSSIBLE SCORE DEVIATION 


It is apparent that the mean of the two raw test scores would not provide 
a valid summary of the student’s performance for the mean would be 
weighted overwhelmingly in favor of the English test score. The conversion 
of each test score to a sigma score makes them equally weighted and com- 
parable, for both test scores have been expressed on a scale with a mean 
of zero and a standard deviation of one. 


X-X 
zum 
o 
Algebra z score = L = E = -1.40 
n 84 — 110 .—26 
English z score — Sic e a -:3:30 


On an equally weighted basis, the performance of the student was fairly 
consistent: 1.40 standard deviations below the mean in algebra and 1.30 
standard deviations below the mean in English. 

Because the normal probability table describes the percentage of area 
lying between the mean and successive deviation units under the normal 
curve (see Appendix B), the use of sigma scores has many other useful 
applications to hypothesis testing, determination of percentile ranks, and 
probability judgments. 


The T score (T). 


T = 50 + 10%—* orso + 402 


` 
Although the sigma score (z) is most frequently used, it is sometimes 
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awkward to have negatives or scores with decimals. Therefore, another 
version of a standard score, the T score, has been devised to avoid some 
confusion resulting from negative z scores (below the mean) and also to 
eliminate decimal values. 

Multiplying the z score by 10 and adding 50 results in a scale of positive 
whole number values. Using the scores in the previous example, T — 50 
+ 10z: 


Algebra T = 50 + 10(—1.40) = 50 + (—14) = 36 
English T = 50 + 10(—1.30) = 50 + (—13) = 37 


T scores are always rounded to the nearest whole number. A sigma 
score of + 1.27 would be converted to a T score of 63. 


T = 50 + 10(+1.27) = 50 + (+12.70) = 62.70 = 63 


The College Board score (Z,.). The College Entrance Examination Board 
and several other testing agencies use another conversion that provides a 
more precise measure by spreading out the scale (see Figure 8—6). 


Ze = 500 + 100 X—*) .. 500 + 100z 


The mean of this scale is 500. 
The standard deviation is 100. 
The range is 200—800. 


Percentile rank. Often useful to describe a score in relation to other 
scores, the percentile rank is the point in the distribution below which a 


FIGURE 8-6 A comparison of three types of standard scores. 
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sven percentage of scores fall. If the eightieth percentile rank is a score 
of 65, 80 percent of the scores fall below 65. The median is the fiftieth 
percentile rank, for 50 percent of the scores fall below it. 

When N is small, the definition needs an added refinement. To be 
completely accurate, the percentile rank is the score in the distribution 
below which a given percentage of the scores fall, plus one half the per- 
centage of space occupied by the given score. 

This point can be demonstrated by a rather extreme example. 


SCORES 


50 
47 
43 
39 
30 


Upon inspection it is apparent that 43 is the median, or occupies the 
fiftieth percentile rank. Fifty percent of the scores should fall below it, but 
in fact only two out of five scores fall below 43. That would indicate 43 
has a percentile rank of 40. But by adding the phrase “plus one half the 
percentage of space occupied by the score,” we reconcile the calculation: 


40% of scores fall below 43: each score occupies 20% of the total space 


40% + 10% = 50 (true percentile rank) 


When N is large, this qualification is unimportant because percentile 
ranks are rounded to the nearest whole number, ranging from the highest 
percentile rank of 99 to the lowest of zero. 

High schools frequently rate their graduating seniors in terms of rank 
in class. Because schools vary so much in size, colleges find these rankings 
of limited value unless they are converted to some common basis for com- 
parison. The percentile rank provides this basis by converting class rank 
into a percentile rank. 


Percentile rank = 100 — Moor — 50) 


where RK = rank from the top 


Jones ranks twenty-seventh in his senior class of 139 students. Twenty- 
six students rank above him, 112 below him. His percentile rank is: 


(2700 — 50) 


100 — 
m 139 


-:100.—.19:—:81 
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In this formula, 50 is subtracted from 100RK to account for half the 
space occupied by the individual's score. 


MEASURES OF RELATIONSHIP 


Correlation. Correlation is the relationship between two or more paired 
variables or two or more sets of data. The degree of relationship is measured 
and represented by the coefficient of correlation. This coefficient may be 
identified by either the letter r, the Greek letter rho (p), or other symbols 
depending upon the data distributions and the way the coefficient has been 
calculated. 

Students who have high intelligence quotients tend to receive high 
scores in mathematics tests, whereas those with low IQs tend to score low. 
When this type of relationship is obtained, the factors of measured intel- 
ligence and scores on mathematics tests are said to be positively correlated. 

Sometimes variables are negatively correlated when a large amount 
of one variable is associated with a small amount of the other, As one 
increases, the other tends to decrease. 

When the relationship between two sets of variables is a pure-chance 
relationship, we say that there is no correlation. 

These pairs of variables are usually positively correlated: As one in- 
creases the other tends to increase. 


1. Intelligence Academic achievement 
2. Productivity per acre Value of farm land 

3. Height Shoe size 

4. Family income Value of family home 


These variables are usually negatively correlated: As one increases the other 
tends to decrease. 


l. Academic achievement Hours per week of TV watching 
2. Total corn production Price per bushel 

3. Time spent in practice Number of typing errors 

4. Age of an automobile Trade-in value 


There are other traits that probably have no correlation. 


1. Body weight Intelligence 
2. Shoe size Monthly salary 


The degree of linear correlation can be represented quantitatively by 
the coefficient of correlation. A perfect positive correlation is + 1.00. A 
perfect negative correlation is — 1.00. A complete lack of relationship is 
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zero (0). Rarely, if ever, are perfect coefficients of correlations of +1.00 
or —1.00 encountered, particularly in relating human traits. Although 
some relationships tend to appear fairly consistently, there are variations 
or exceptions that reduce the measured coefficient from either a — 1.00 
or a +1.00 toward zero. 

A definition of perfect positive correlation specifies that for every unit 
increase in one variable there is a proportional unit increase in the other. 
The perfect negative correlation specifies that for every unit increase in one 
variable there is a proportional unit decrease in the other. That there can 
be no exceptions explains why coefficients of correlation of + 1.00 or — 1.00 
are not encountered in relating human traits. The sign of the coefficient 
indicates the direction of the relationship, and the numerical value its 
strength. 


The scattergram and linear regression line. When the relationship be- 
tween two variables is plotted graphically, paired variable values are plotted 
against each other on the X and Y axis. 

The line drawn through, or near, the coordinate points is known as 
the “line of best fit,” or the regression line. On this line the sum of the 
deviations of all the coordinate points has the smallest possible value. As 
the coefficient approaches zero (0) the coordinate points fall further from 
the regression line (see Figure 8—7 for examples of different correlations’ 
scattergrams). 

When the coefficient of correlation is either + 1.00 or — 1.00, all of 
the coordinate points fall on the regression line, indicating that, when r = 
+ 1.00, for every increase in X there is a proportional increase in Y; and 
when r = — 1.00, for every increase in X there is a proportional decrease 
in Y. There are no individual exceptions. If we know a person's score 
on one measure, we can determine his or her exact score on the other 
measure. 

The slope of the regression line, or line of best fit, is not determined 
d guess or estimation but by a geometric process that will be described 

ater. 

There are actually two regression lines. When r = + 1.00 or — 1.00, 
the lines are superimposed and appear as one line. As r approaches zero, 
the lines separate further. 

Only one of the regression lines is described in this discussion, the 
Y on X (or Y from X) line. It is used to predict unknown Y values from 
known X values. The X values are known as the predictor variable, and 
the Y values, the predicted variable. The other regression line (not described 
here) would be used to predict X from Y. 


Plotting the slope of the regression line. The slope of the regression 
(Y from X) line is a geometric representation of the coefficient of correlation 


TM 


Descriptive Data Analysis 245 


FIGURE 8-7 Scatter diagrams illustrating different coefficients of correlation. 


and is expressed as a ratio of the magnitude of the rise (if r is +) to the 
run, or as a ratio of the fall (if r is —) to the run, expressed in standard 


deviation units. 
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For example, ifr = +.60, for every sigma unit increase (run) in X, 
there is a .60 sigma unit increase (rise) in Y. 


*.602, 


1002, 


Ifr — —.60, for every sigma unit increase (run) in X, there is a .60 
sigma unit decrease (fall) in Y. 


—.602, 


1002, 


The geometric relationship between the two legs of the right triangle 
determines the slope of the hypotenuse, or the regression line. 

Because all regression lines pass through the intersection of the mean 
of X and the mean of Y lines, only one other point is necessary to determine 
the slope. By measuring one standard deviation of the X distribution on 
the X axis and a .60 standard deviation of the Y-distribution on the Y axis, 
the second point is established (see Figures 8—8 and 8-9). 


FIGURE 8-8 A positive regression line, r = +.60. 


| 
j 
| 


- 
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FIGURE 8-9 A negative regression line, r = —.60.- 


The regression line (7) involves one awkward feature: all values must 
be expressed in sigma scores (z) or standard deviation units. It would be 
more practical to use actual scores to determine the slope of the regression 
line. This can be done by converting to a slope known as b. The slope of 
the b regression line Y on X is determined by the formula: 


b= pa 
9x 
For example, if r = 4.60 
and 4,76 
ox = 5 
b= +608 = ae = +.72 


Thus an r of +.60 becomes b = +.72. Now the ratio of the rise to the run 
has another value and indicates a different slope of the regression line 
(Figure 8-10). 
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Slope r Slope b 
602, Ter- 
1.002, X 1.00% 
(Sigma scores) (Raw scores) 


FIGURE 8-10 Two regression lines, r and b. An r of *-.60 is converted to a b of +.72 by the formula 


b=r% 
9, 


Pearson's Product-Moment Coefficient 
of Correlation (r) 


The most often used and most precise coefficient of correlation is known 
as the Pearson product-moment coefficient (r). This coefficient may be calculated 
by converting the raw scores to sigma scores and finding the mean value 
of their cross-products. 


,. Bede) 


N 

Z, zy (zy) 
+1.50 +1.20 +1.80 
+2.00 +1.04 +2.08 
=.75 —.90 +.68 
+.20 +.70 +.14 
—1.00 +.20 —.20 
—.40 +.30 —.12 
+1.40 +.70 +.98 
*.55 +.64 +.35 
—.04 +.10 —.00 
= 10 +.30 —.08 


Z(zj(z,) = 5.68 


= +.568 


If most of the negative z values of X are associated with negative z 
values of Y, and positive z values of X with Positive z values of Y, the $ 
correlation coefficient will be positive. If most of the paired values are of 
opposite signs, the coefficient will be negative. 
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positive correlation (+)(+) = + high on X, high on Y 
(—)(—) = + low on X, low on Y 


negative correlation(+)(—) = — high on X, low on Y 
(—)(+) = — low on X, high on Y 


The z score method is not often used in actual computation because 
it involves the conversion of each score into a sigma score. Two other 
methods, a deviation method and a raw score method, are more convenient, 
more often used, and yield the same result. 

The deviation method uses the following formula and requires the 
setting up of a table with seven columns. 


p I 


VERE) 
where È x? = the Bu OE the X subtracted from each X score squared 
Ey- the Ne of the Y subtracted from each Y score squared 
Dx = Uu A a of the mean subtracted from that score 
(X — XY - Y) 


Using the data from Table 8—4, with reading scores being the X variable 
and arithmetic scores being the Y variable, we calculate r like this: 


VARIABLES 
X Y x xt F y? xy 
95 76 20 400 1 1 +20 
90 78 15 225 3 9 +45 
85 77 10 1007. 2 4 +20 
80 71 5 25 -4 16 —20 
75 75 0 0-0 0 0 
70 74: 25 4 16 —20 
65 73 -10 100 .—2 4 +20 
60 Fett 225 —3 9 +45 
55 74 -20 80 1 +20 
=X = 675 ZY = 675 Xx? = 1500 Ey = 60 xy = 130 


X = 75 Y 2 75 


= eee rn T 488 
"= X500 460) . V90,000 300 — ^. 
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The raw score method requires the use of five columns as illustrated 
below using the same data. 


s NXXY - (3 XY) 
= VNZX?* - (SXP VN ZY^- (SYP 


where £X = sum of the X scores | 
= Y = sum of the Y scores 
= X? = sum of the squared X scores 
E Y? = sum of the squared Y scores 
E XY = sum of the products of paired X and Y scores 
N = number of paired scores 


VARIABLES 
X y x y XY 
95 76 9025 5776 7220 
90 78 8100 6084 7020 
85 77 7225 5929 6545 
80 71 6400 5041 5680 
75 75 5625 5625 5625 
70 79 4900 6241 5530 
65 73 4225 5329 4745 
60 72 3600 5184 4320 

_55 74 3025 5476 4070 


=X = 675 DY = 675 EX? = 52,425 Ð Y? = 50,685 X XY = 50,755 


DA 9(50,755) — (675)(675) 
V9(52,125) — (675)? V9(50,685) — (675)? 
456,795 — 455;625 


" 7 Va69,125 — 455,625 456,165 — 455,625 
a 1170 
Vi3,500 V540 
DA 1170 
(116.19)(23.24) 
T. 1170 L 
2700.26 ` 
Rank Order Correlation (p) 


A particular form of the Pearson product-moment correlation that can be 
used with ordinal data is known as the Spearman rank order coefficient of 
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correlation. The symbol p (rho) is used to represent this correlation coef- 
ficient. The paired variables are expressed as ordinal values (ranked) rather 
than as interval or ratio values. It lends itself to an interesting graphic 
demonstration. 

In the following example, the students ranking highest in IQ rank 
highest in mathematics, and those lowest in IQ, lowest in mathematics. 


Achievement in 


Pupil IQ rank mathematics rank 
A 1 1 
B 2 2 
Cc 3 3 
D 4 4 
E 5 5 


Perfect positive coefficient of correlation 
p = +1.00 


In the following example, the students ranking highest in time spent 
in practice rank lowest in number of errors. 


Time spent in Number of typing 


Pupil practice rank errors rank 
A 1 5 
B 
Cc 3 3 
D 4 2 
E 5 3 1 


Perfect negative coefficient of correlation 
p= -1.00 
In the following example, there is probably little more than a pure 


chance relationship (due to sampling error) between height and intelli- 
gence. 


Pupil Height rank IQ rank 
A 3 
B 2 4 
c 3 2 
D 4 1 
E 5—5 


Very low coefficient of correlation 
p = +.10 
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To compute the Spearman rank order coefficient of correlation, this 
rather simple formula is used: 


tenes adm Dis 
P705 7 NN? = 1) 


where D = the difference between paired ranks 
Z D? = the sum of the squared differences between ranks 
N - number of paired ranks 


If we converted the previously used data to ranks and calculated 
Spearman's p, it would look like this: 


RANK IN RANK IN 
PUPIL READING  ARITHMETIC D D? 
Arthur 1 4 -8 9 
Betty 2 2 0 0 
John E 3 0 0 
Katherine 4 9 -5 25 
Charles 5 5 0 0 
Larry 6 1 5 25 
Donna 7 7 0 0 
Edward 8 8 0 0 
Mary 9 6 8 r9 
= D? = 68 
24. 968)  , 408 
^7. 798 -7-!- (o 
408 
p= 1-255 = 1 — 567 
p = +.433 


As we have just demonstrated, Spearman's p and Pearson's r yielded 
the same result. This is the case when there are no ties. When there are 
ties, the results will not be identical, but the difference will be insignificant. 

The Spearman rank order coefficient of correlation computation is 
quick and easy. It is an acceptable method if data are available only in 
ordinal form. Teachers may find this computation method useful when 
conducting studies using a single class of students as subjects. 


Phi Correlation Coefficient (o) 


The data are considered dichotomous when there are only two choices for 
scoring a variable (e.g., pass-fail or female-male). In these cases, each per- 
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son's score usually would be represented by a 0 or 1, although sometimes 
l and 2 are used instead. The Pearson product-moment correlation, when 
both variables are dichotomous, is known as the phi (p) coefficient. The 
formula for ¢ is simpler than for Pearson's r but algebraically identical. 
Because we rarely have two dichotomous variables of interest of which we 
want to know the relationship, we will not present the formula here. This 
brief mention of ó is to make the reader aware of it. Those wishing more 
detail should refer to one of the many statistics texts available (e.g., Fer- 
guson, 1981; Glass & Hopkins, 1984). 


INTERPRETATION OF A CORRELATION COEFFICIENT 


Two circumstances can cause a higher or lower correlation than usual. 
First, when one person or relatively few people have a pair of scores that 
differ markedly from the rest of the sample's scores, the resulting r may 
be spuriously high. When this occurs, the researcher needs to decide whether 
to remove this individual's pair of scores (known as an outlier) from the 
data analyzed. Second, when all other things are equal, the more homo- 
geneous a group of scores, the lower their correlation will be. That is, the 
smaller the range of scores, the smaller r will be. Researchers need to 
consider this potential problem when selecting samples that may be highly 
homogeneous. However, if the researcher knows the standard deviation 
of the heterogeneous group from which the homogeneous group was se- 
lected, Glass and Hopkins (1984) and others describe a formula that corrects 
for the restricted range and provides the correlation for the heterogeneous 
group. 

There are a number of ways to interpret a correlation coefficient or 
adjusted correlation coefficient depending upon the researcher's purpose 
and the circumstances that may influence the correlation's magnitude. One 
method that is frequently presented is to use a crude criterion for evaluating 
the magnitude of a correlation: 


COEFFICIENT (r) RELATIONSHIP 


.00 to .20 Negligible 

.20 to .40 Low 

.40 to .60 Moderate 

.60 to .80 Substantial 

-80 to 1.00 High to very high 


Another interpretative approach is a test of statistical significance of 
the correlation, based upon the concepts of sampling error and tests of 
significance described in Chapter 9. 

Another way of interpreting a correlation coefficient is in terms of 
variance. The variance of the measure that we want to predict can be 
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divided into the part that is explained by, or due to, the predictor variable 
and the part that is explained by other factors (generally unknown) in- 
cluding sampling error. We find this percentage of explained variance by 
calculating 7?, known as the coefficient of determination. The percent of var- 
iance not explained by the predictor variable is then 1 — r?. 

An example may help the reader understand this important concept. 
In combining studies using IQ to predict general academic achievement, 
Walberg (1984) found the overall correlation between these variables to be 
.71. We can use this correlation to find r? = .50. This means that 50 percent 
of the variance in academic achievement (how well or poorly different 
students do) is predictable from the variance of IQ. This also obviously 
means that 50 percent of the variance of academic achievement is due to 
factors other than IQ, such as motivation, home environment, school at- 
tended, and test error. Walberg also found that the correlation of IQ with 
science achievement was .48, This means that only 23 percent (r°) of var- 
iance in science achievement is predictable by IQ and that 77 percent is 
due to other factors, some known and some unknown. 

There are additional techniques, too advanced for this introductory 
text, that allow researchers to use more than one variable. Thus it is possible, 
for example, to use a combination of IQ, self-concept scores, a measure of i 
motivation, and a socioeconomic scale to predict academic achievement. 
This multiple correlation would increase the correlation, which would in 
turn increase the percent of variance of academic achievement that is ex- 
plained by known factors. 


Misinterpretation of the Coefficient of Correlation 


Several fallacies and limitations should be considered in interpreting the 

meaning of a coefficient of correlation. The coefficient does not imply a 
cause-and-effect relationship between variables. High positive correlations 

have been observed between the number of storks' nests and the number 

of human births in northwestern Europe, and between the number of 
ordinations of ministers in the New England colonies and the consumption 

of gallons of rum. These high correlations obviously do not imply causality. 

As population increases, both good and bad things are likely to increase | 
in frequency. 

Similarly, a zero (or even negative) correlation does not necessarily 
mean that no causation is possible. Glass and Hopkins (1984) point out 
that "some studies with college students have found no correlation between 
hours of study for an examination and test performance. . . . [This is likely 
due to the fact that] some bright students study little and still achieve 
average scores, whereas some of their less gifted classmates study diligently 
but still achieve an average performance. A controlled experimental study 
would almost certainly show some cavsal relationship" (p. 106). 
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Prediction 


An important use of the coefficient of correlation and the Y on X regression 
line is for prediction of unknown Y values from known X values. Because 
itis a method for estimating future performance of individuals on the basis 
of past performance of a sample, prediction is an inferential application 
of correlational analysis. It has been included in this chapter to illustrate 
one of the most useful applications of correlation. 

Let us assume that a college's admissions officers wish to predict the - 
likely academic performance of students considered for admission or for 
scholarship grants. They have built up a body of data based upon the past 
records of a substantial number of admitted college students over a period 
of several years. They have calculated the coefficient of correlation between 
their high school grade-point averages and their college freshman grade- 
point averages. They can now construct a regression line and predict the 
future college freshman GPA for any prospective student, based upon his 
or her high school GPA. 

Let us assume that the admissions officers found the coefficient of 
correlation to be +.52. The slope of the line could be used to determine 
any Y values for any X value. This process would be quite inconvenient, 
however, for all grade-point averages would have to be entered as sigma 
(z) values. 

A more practicable procedure would be to construct a regression line 
with a slope of b so that any college grade-point average (Y) could be 
predicted directly from any high school grade-point average. The b regres- 
sion line and a carefully drawn graph would provide a quick method for 
prediction. For example: 


aia pes tigre lee 
If: r= +.52 Then: b= ' (S) 
t = q 52 (50) 
Sy = .50 b= SGO) 
Sy= .60 = +.43 


X, is student A’s high school GPA, Y, his predicted college GPA. 
X5 is student B's high school GPA, Y; her predicted college GPA. 
Figure 8—11 uses these data to predict college GPA from high school GPA. 


Another, and perhaps more accurate, alternative for predicting un- 
known Ys from known Xs is to use the regression equation rather than the 
graph. The formula for predicting Y from X is: 


Y=a+ bx 
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Y = The predicted score (e.g., college freshman GPA) 
X = The predictor score (e.g., high school GPA) 

b = Slope 

a = Constant, or Y intercept 


We have already seen that b = $,/Sy. We can find a by a = Y — bX. Given 
the following data, we can then find the most likely freshman GPA for two 
students. 


b = .43 (found earlier) 
X 210 

Y = 240 

a 


= 240 — 2.10(.43) = 2.40 — .90 = 1.50 
X, (student A's high school GPA) = 2.00 
X, (student B's high school GPA) — 3.10 


Y, = 1.50 + .43(x,) 
V, = 1.50 + .43(2.00) 
Y, = 1.50:+ .86 

V, = 2.36 


Y, = 1.50 + .43(X,) 
V, = 1.50 + .43(3.10) 
Y, = 1.50 + 1.33 

y, = 2.83 


For student A, whose high school GPA was below the mean, the 
predicted college GPA was also below the mean. For student B, whose high 
school GPA was well above the mean, the predicted GPA was substantially 
above the mean. These results are consistent with a positive coefficient of 
correlation in general: high in X, high-in Y; low in X, low in Y. 


STANDARD ERROR OF ESTIMATE 


When the coefficient of correlation based upon a sufficient body of data 
has been determined as + 1.00, there will be no error of prediction. Perfect 
correlation indicates that for every increase in X, there is a proportional 
increase (when +) or proportional decrease (when —) in Y. There are no 
exceptions. But when the magnitude of r is less than +1.00 or — 1.00, 
error of prediction is inherent because there have been exceptions to a 


— 
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FIGURE 8-11 A regression line used to predict college freshman GPA from high school GPA, 


consistent, orderly relationship. The regression line does not coincide or 
pass through all of the coordinate values used in determining the slope. 

A measure for estimating this prediction error is known as the standard 
error of estimate (S,,). 


Sau, = Sy V1 - P 


As the coefficient of correlation increases, the prediction error de- 
creases. When r = +1.00 


Say, = Sy V1 — r? = Sy V1 - (1? = S,(0 = 0 
When r = 0 


Sae, = Sy V1 — (0)? = Sy(1) = Sy 


When r = 0 (or when the coefficient of correlation is unknown), the 
best blind prediction of any Y from any X is the mean of Y. This is true 
because we know that most of the scores in a normal distribution cluster 
around the mean and that about 68 percent of them would probably fall 
within one standard deviation from the mean. In this situation the standard 
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deviation of Y may be thought ot as the standard error of estimate. When 
Te 0, Sao Sy. 4 

- If the coefficient of correlation is more than zero, this blind prediction 
can be improved upon in these ways: 


l. By plotting Y from a particular X from the regression line (see Figure 
8-12) 


9. By reducing the error of prediction of Y by calculating how much Sy 
is reduced by the coefficient of correlation 


, For example, when r= x60 


Say = Sy V1 — (f = S, V1 = (60 = S, VT 36 
= Sy V64 = .80S, 


Thus the estimate error of Y has been reduced from Sy to .80S,. 
Interpretation of the standard error of estimate is similar to the interpre- 
tation of the standard deviation. If r = +.60Sy, the standard error of 
estimate of Y will be .80Sy. An actual performance score of Y would probably 
fall within a band of +.80S, from the predicted Y in about 68 of 100 
predictions. In other words, the probability is that the predicted score would 
not be more than one standard error of estimate from the actual score in 
about 68 percent of the predictions. 

In addition to the applications described, the coefficient of correlation 
is indispensable to psychologists who construct and standardize psycholog- 


FIGURE 8-12. A predicted Y score from a given X score, showing the standard error of estimate. 
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ical tests and inventories. A few of the basic procedures are briefly de- 
scribed. 

Computing the coefficient of correlation is the usual procedure used 
to evaluate the degree of validity and reliability of psychological tests and 
inventories (see Chapter 7 for a more detailed description of these con- 
cepts). 


The coefficient of validity. A test is said to be valid to the degree that it 
measures what it claims to measure, or, in the case of predictive validity, 
to the extent that it predicts accurately such types of behavior as academic 
success or failure, job success or failure, or stability or instability under 
stress, Tests are usually validated by correlating test scores against some 
outside criteria, which may be scores on tests of accepted validity, successful 
performance or behavior, or the expert judgment of recognized authorities. 


The coefficient of reliability. A test is said to be reliable to the degree 
that it measures accurately and consistently, yielding comparable results 
when administered a number of times. There are a number of ways of 
using the process of correlation to evaluate reliability: 


1, Test-retest—correlating the scores on two or more successive admin- 
istrations of the test (administration number 1 versus administration 
number 2) 

2. Equivalent forms—correlating the scores when groups of individuals 
take equivalent forms of the test (form L versus form N) 

3. Split halves—correlating the scores on the odd items of the test (num- 
bers 1, 3, 5, 7, etc.) against the even items (numbers 2, 4, 6, 8, etc.). 
This method yields lower correlations because of the reduction in size 
to two tests of half the number of items. This may be corrected by 
the application of the Spearman-Brown prophecy formula. 

* 


If r = +.60, 


A NOTE OF CAUTION 


Statistics is an important tool of the research worker, and an understanding 
of statistical terminology, methodology, and logic is important for the con- 
sumer of research. A number of limitations, however, should be recognized 
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in using statistical processes and in drawing conclusions from statistical 
evidence. 


l. 


Statistical process, a servant of logic, only has value if it verifies, clar- 
ifies, and measures relationships that have been established by clear, 
logical analysis. Statistics is a means, never an end, of research. 


A statistical process should not be employed in the analysis of data 
unless it adds clarity or meaning to the analysis of data. It should not 
be used as window dressing to impress the reader. 


The conclusions derived from statistical analysis will be no more ac- 
curate or valid than the original data. To use an analogy, no matter 
how elaborate the mixer, a cake made of poor ingredients will be a 
poor cake. All the refinement of elaborate statistical manipulation will 
not yield significant truths if the data result from crude or inexact 
measurement. In computer terminology, this is known as GI-GO, 
"garbage in—garbage out.” 

All treatment of data must be checked and double-checked frequently 
to minimize the likelihood of errors in measurement, recording, tab- 
ulation, and analysis. 

There is a constant margin of error wherever measurement of human 
beings is involved. The error is increased when qualities or charac- 
teristics of human personality are subjected to measurement or when 
inferences about the population are made from measurements de- 
rived from statistical samples. 

When comparisons or contrasts are made, a mere number differ- 
ence is not in itself a valid basis for any conclusion. A test of statistical 
significance should be employed to weigh the possibility that chance 
in sample selection could have yielded the apparent difference. To 
apply these measures of statistical significance is to remove some of 
the doubt from the conclusions. 


Statisticians and liars are often equated in humorous quips. There is 
little doubt that statistical processes can be used to prove nearly any- 
thing that one sets out to prove if the procedures used are inappro- 
priate. Starting with false assumptions, using inappropriate proce- 
dures, or omitting relevant data, the biased investigator can arrive at 
false conclusions. These conclusions are often particularly dangerous 
because of the authenticity that the statistical treatment seems to con- 
fer. Of course intentionally using inappropriate procedures or omit- 
ting relevant data constitutes unethical behavior and is quite rare. 


Distortion may be deliberate or unintentional. In research, omitting 


certain facts or choosing only those facts favorable to one’s position is as 
culpable as actual distortion, which has no place in research. The reader 


SUMMARY 


EXERCISES 
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must always try to evaluate the manipulation of data, particularly when the 
report seems to be persuasive. 


This chapter deals with only the most elementary descriptive statistical 
concepts. For a more complete treatment the reader is urged to &onsult 
one or more of the references listed. 

Statistical analysis is the mathematical process of gathering, organiz- 
ing, analyzing, and interpreting numerical data, and is one of the basic 
phases of the research process. Descriptive statistical analysis involves the 
description of a particular group. Inferential statistical analysis leads to 
judgments about the whole population, to which the sample at hand is 
presumed to be related. 

Data are often organized in arrays in ascending or descending nu- 
merical order. Data are often grouped into class intervals so that analysis 


is simplified and characteristics more readily noted. 

Measures of central tendency (mean, median, and mode) describe 
data in terms of some sort of average. Measures of position, spread, or 
dispersion describe data in terms of relationship to a point of central tend- 
ency. The range, deviation, variances, standard deviation, percentile, and 
sigma score are useful measures of position, spread, or dispersion. 

Measures of relationship describe the relationship of paired variables, 
quantified by a coefficient of correlation. The coefficient is useful in ed- 
ucational research in standardizing tests and in making predictions when 
only some of the data are available. Note that a high coefficient does not 
imply a cause-and-effect relationship but merely quantifies a relationship 
that has been logically established prior to its measurement. 

Statistics is the servant, not the master, of logic; it is a means rather 
than an end of research. Unless basic assumptions are valid, unless the 
right data are carefully gathered, recorded, and tabulated, and unless the 
analysis and interpretations are logical, statistics can make no contribution 


to the search for truth. 


1. More than half the families in a community can have an annual income that is 
lower than the mean income for that community. Do you agree or disagree? 
Why? 

2. The median is the midpoint between the highest and the lowest scores in a 
distribution. Do you agree or disagree? Why? 
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8. Compute the mean and the median of this distribution: 


74 
72 
70 
65 
63 
61 
56 
51 
42 
40 
37 
33 


4. Determine the mean, the median, and the range of this distribution: 


88 
86 
85 


5. Compute the variance (o?) and the standard deviation (c) for this set of scores: 
27 


6. 


ré 


27 
25 
24 
20 
18 
16 
16 
14 
12 
10 
77 
The distribution with the larger range is the distribution with the larger standard 
deviation. Do you agree or disagree? Why? 
If five points were added to each score in a distribution, how would this change 
each of the following: 
the range 
the mean 
the median 
the mode 
the variance 
the standard deviation 


7920cn» 


9. 


10. 


11. 


12. 


13. 
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yoan Brown ranked twenty-seventh in a graduating class of 367. What was her 
percentile rank? 
In a coin-tossing experiment where N — 144 and P (probability) — .50, draw 
the curve depicting the distribution of probable outcomes of heads appearing 
for an infinite number of repetitions of this experiment. Indicate the number of 
heads for the mean, and at 1, 2, and 3 standard deviations from the mean, 
both positive and negative. 
Assuming the distribution to be normal with a mean of 61 and a standard 
deviation of 5, calculate the following standard score equivalents: 
X 7 £ T 
66 
58 
70 
61 
52 
Using the normal probability table in Appendix B, calculate the following values: 
. below —1.25z % 
b. above —1.25z 926 
c. between —1.40z and +1.67z E cob 
d. between +1.50z and +2.50z % 
e. 65th percentile rank z 
f. 43rd percentile rank z 
g. top 1% of scores z 
h. middle 5096 of scores z to z 
i. not included between —1.00z and +1.00z pr 956 
j. 50th percentile rank z 
Assuming a normal distribution of Scores, a test has a mean score of 100 and 
a standard deviation of 15. Compute the following scores: 
a. score that cuts off the top 10% 
b. score that cuts off the lower 40% os 
C. percentage of scores above 90 ILC 
d. score that occupies the 68th percentile rank 
e. score limits of the middle 68% to 
Consider the following table Showing the performance of three students in 
algebra and history: 
MEAN c TOM DONNA HARRY 
Algebra 90 30 60 100 85 
History 20 4 25 22 19 
Who had: 


a. the poorest score on either test? 
b. the best score on either test? 
C. the most consistent scores on both tests? 


HI 
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14. 


15. 


16. 


17. 


d... the least consistent scores on both tests? vani. 
e. the best mean score on both tests? ed c 
f. the poorest mean score on both tests? á PEEN 
The coefficient of correlation measures the magnitude of the cause-and-effect 
relationship between paired variables. Do you agree or disagree? Why? 


Using the Spearman rank order coefficient of correlation method, compute p. 


, X VARIABLE Y VARIABLE 


Mary ow 3 
Peter 2 4 
Paul 3 1 
Helen 4 2 
Ruth 5 7 
Edward 6 5 
John 7 6 


Two sets of paired variables are expressed in sigma (z) scores. Compute the 
coefficient of correlation between them. 


Zy Zy 

+.70 +.55 
=.20 =.32 
+1.50 +2.00 
+1.33 +1.20 
-.88  -—1.06 
+.32 —.40 
—1.00 +.50 
+.67 +.80 
—.30 —.10 
*125 41.0 
+.50 —.20 


Using the Pearson product-moment raw score method, compute the coefficient 
of correlation between these paired variables: 


Kine Yo XE ss VBC XY, 
66 42 
50 55 
43 60 
8 24 
12 30 
35 18 | 
24 48 
20 35 | 
16 22 | 
54 38 | 
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18. Aclass took a statistics test. The students completed all of the questions. The 
coefficient of correlation between the number of correct and the number of 
incorrect responses for the class was : 

19. There is a significant difference between the slope of the regression line r and 
that of the regression line b. Do you agree? Why? 

20. Compute the standard error of estimate of Y from X when: 


S, = 6.20. 
r= +.60 


21. Given the following information, predict the Y score from the given X, when X 
= 90, and: 


X=80 S= 12 
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INFERENTIAL 
DATA 
ANALYSIS 


In Chapter 1 we described the ultimate purpose of research as the discovery 
of general principles based upon observed relationships between variables. 
If it were necessary to observe all of the individuals in the population about 
which one wished to generalize, the process would be never-ending and 
prohibitively expensive. The practical solution is to select samples that are 
representative of the population of interest; then, through observations 
and analysis of the sample data, the researcher may infer characteristics of 
the population. (The reader may wish to refer to the discussion of types 
of samples and sampling procedures presented in Chapter 1.) 
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Many laypersons share the misconception that an adequate sample must be 
aminiature carbon copy, or have the identical characteristics, of the population 
under study. If a large number of researchers selected random samples of 
100 teachers from the population of all teachers in California, the mean weight 
of the samples would not be identical. A few would be relatively high, a few 
relatively low, but most would tend to cluster around the population mean. 
This variation of sample means is due to what is known as sampling error. 

term does not suggest any fault or mistake in the sampling process but merely 
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describes the chance variations that are inevitable when a number of randomly 
selected sample means are computed. 

Estimating or inferring a population characteristic (parameter) from 
a random sample (statistic) is not an exact process. It has been noted that 
successive means of randomly selected samples from the same population 
are not identical. Thus, if these means are not identical, it would be logical 
to assume that any one of them probably differs from the population mean. 
This would seem to present an insurmountable obstacle to statisticians, for 
they have only a sample to use as a basis for generalizations about a pop- 
ulation. Fortunately, an advantage of random selection is that the sample 
statistic will be an unbiased estimate of the population parameter. Because 
the nature of the variations of random sample means is known, it is possible 
to estimate the degree or variation of sample means on a probability basis. 


THE CENTRAL LIMIT THEOREM 


An important principle, known as the central limit theorem, describes the 


characteristics of sample means. 
If a large number of equal-sized samples (greater than 30 subjects) is 


selected at random from an infinite population: 


l. The means of the samples will be normally distributed. 

2. The mean value of the sample means will be the same as the mean 
of the population. 

3. Thedistribution of sample means will have its own standard deviation. 
This is in actuality the distribution of the expected sampling error. 
Known as the standard error of the mean, it is computed from this 


formula: 
Quos 
x VN 
where $ = the standard deviation of individual scores 
N = the size of the sample 
Sx = the standard error of the mean 


To illustrate the operation of the central limit theorem, let us assume 
that the mean of a sample is 180 and the standard deviation is 12. Figure 
9— 1 illustrates the relationship between the distribution of individual scores 
and the distribution of sample raeans when the sample size is 36. If X — 


180, N = 36, and S = 12: 
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FIGURE 9-1 Normal distribution of individual Scores and of sample means when N = 36. 


The standard error of the mean has a smaller value than the standard 
deviation of individual scores. This is understandable, because in comput- 
ing the means of samples, the extreme scores are not represented; means 
are middle score values. Note the difference between the range and stand- 
ard deviation of individual scores and those of the sample means. 

From the formula 


it is apparent that as the size of the sample increases, the standard error 
of the mean decreases. To cite extreme cases as illustrations, as the sample 
N approaches infinity, the mean approaches the population mean and the 
standard error of the mean approaches zero. 
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As the sample is reduced in size and approaches one, the standard error 
of the mean approaches the standard deviation of the individual scores. 


As sample size increases, the magnitude of the error decreases. Sample size 
and sampling error are negatively correlated (see Figure 9-2). 

Itmay be generalized that, as the number of independent observations 
increases, the error involved in generalizing from sample values to pop- 
ulation values decreases and accuracy of prediction increases. 

To the statisticians who must estimate the population mean from a 
sample mean, their obtained sample mean would not be too far away from 
the unknown population mean. One might-say that the population mean 
is "known only to:God," but a particular mean calculated from a randomly 
selected sample can be related to the population mean in the same way as 
an individual's score is related to the mean, by using the normal curve table 
in Appendix B. 

The chances or probabilities are approximately 


w that the sample mean will not be farther than 1 Sy from the pop- 
ulation mean 


io that the sample mean will not be farther than 1.96 Sx from the 
population mean 


io that the sample mean will not be farther than 2.58 Sy from the 
population mean 


FIGURE 9-2 The relationship between sample size and the magnitude of sampling error. 
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Thus the value of a population mean, inferred from a randomly 
selected sample mean, can be estimated on a probability basis. In the ex- 
ample presented in Figure 9—1, since Sz = 2 points, there is approximately 


ae probability that the mean of any randomly selected sample of N = 
36 and S = 12 would not be more than 2 points away from the population 


mean, and 5 probability that the sample mean would not be more than 
3.92 points away (+ 1.965%). 

Knowing the mean and the standard error of the mean of a sample, 
we can easily determine the confidence interval, within which the "true" 
mean of the population most likely will be. To find the 95 percent confi- 
dence interval, the standard error of the mean is multiplied by 1.96 and 
the result is added to and subtracted from the mean. To find the 99 percent 
confidence interval, the standard error of the mean is multiplied by 2.58 
and the result is added to and subtracted from the mean. Thus if we had 
a sample with a mean of 93, and a standard error of the mean (Sx) of 3.2, 
the 95 percent confidence interval would be 


Hos% (the population mean) = 93 + (1.96) S; = 93  (1.96)3.2 = 93 + 6.27 
Hos% = between 86.73 and 99.27 


"The 99 percent confidence interval would be 


Hos% = 93 + (2.58) S; = 93 + (2.58) 3.2 = 93 + 8.26 
Hog% = between 84.74 and 101.26 


We could then say that 95 times out of 100 we would probably be 
correct in stating that the mean of the population is between 86.73 and 
99.27; and correct 99 times out of 100 in stating that the mean of the 
population is between 84.74 and 101.86. 


PARAMETRIC TESTS 


Parametric tests are considered to be the most powerful tests and should 
be used if their basic assumptions can be met. These assumptions are based 
on the nature of the population distribution and on the way the type of 
scale is used to quantify the data observations. However, as we mentioned 
in Chapter 8, some parametric tests (the t test and analysis of variance, in 
particular) are quite robust and are appropriate even when some assump- 
tions are violated (see Glass & Hopkins, 1984, for a more complete expla- 
nation). The assumptions for most parametric tests are the following: 
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l. The observations are independent. The selection of one case is not 
dependent upon the selection of any other case (there are specific 
parametric tests for nonindependent samples). 

2. The samples have equal or nearly equal variances. This condition is 
particularly important to determine when samples are small. 

3. The variables described are expressed in interval or ratio scales. Nom- 
inal measures (frequency counts) and ordinal measures (ranking) do 
not qualify for parametric treatment. 


TESTING STATISTICAL SIGNIFICANCE 


The Significance of the Difference between the 
Means of Two Independent Groups 


Because a mean is probably the most satisfactory measure for characterizing 
a group, researchers find it important to determine whether the difference 
between means of samples is significant. To illustrate the point, an example 
might be helpful. 

Let us assume that an experiment is set up to compare the relative 
effectiveness of two methods of teaching reading. A sample is randomly 
selected and the subjects randomly assigned to either the experimental 
group or the control group. 

The experimental group is taught by the initial teaching alphabet 
method and the control group by the traditional alphabet. At the end of 
a year a standardized reading test is administered and the mean score of 
each. group is computed. The effectiveness of the experimental group 
method as compared to the effectiveness of the control group method is 
the issue, with the end-of-year mean scores of each group the basis for 
comparison. 

A mere quantitative superiority of the experimental group mean score 
over the control group mean score is not conclusive proof of its superiority. 
Because we know that the means of two groups randomly drawn from the 
same population are not necessarily identical, any difference that appeared 
at the end of the experimental cycle could possibly be attributed to sampling 
error or chance. To be statistically significant, the difference must be greater 
than that reasonably attributed to sampling error. Determining whether a 
difference is significant always involves discrediting a sampling error ex- 
planation. The test of the significance of the difference between two means 
is known as a t test. It involves the computation of the ratio between ex- 
perimental variance (observed difference between two sample means) and 
error variance (the sampling error factor). 
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Lx -X 


"IgE duc uiid ac d 
Si,Ss 
iM” No 


X, = mean of experimental sample 

X, — mean of control sample 

N, = number of cases in experimental sample 
Na = number of cases in control sample 


variance of experimental sample g 
S$ = variance of control sample 


If the value of the numerator in this ratio is not significantly greater 
than the denominator, it is likely that sampling error—not the effect of 
the treatment or experimental variable—is indicated. But before we discuss 
the quantitative criteria that determines the statistical significance of the 
difference between means, two additional concepts should be considered: 


l.. The null hypothesis (Ho) i 
2. The level of significance 


The Null Hypothesis (H,) 


A null hypothesis states that there is no significant difference or relationship 
between two or more parameters. It concerns a judgment as to whether 
apparent differences or relationships are true differences or relationships 
or whether they merely result from sampling error. The experimenter 
formulates for statistical purposes a null hypothesis, a no-difference or 
relationship hypothesis. The experimenter hypothesizes that any apparent 
difference between the mean achievement of the experimental and control 
sample groups at the end of the experimental cycle is simply the result of 
sampling error, as explained by the operation of the central limit theorem. 
Itshould be noted that, although the null hypothesis is needed for statistical 
purposes, most actual hypotheses are alternatives to the null; that is, hy- 
potheses that propose that differences will exist. 

The use of the null hypothesis is not restricted to experimental studies. 
It may be used when inferring generalizations about populations from 
sample data in descriptive research studies. 

Students have complained that the statement of a null hypothesis 
sounds like double talk. They are understandably puzzled about the reasons 
for the negative statement that the researcher attempts to reject. The ex- 
planation is somewhat involved, but the logic is sound. Verification of one 
consequence of a positive hypothesis does not prove it to be true. Observed 


en 
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consequences that may be consistent with a positive hypothesis may also be 
compatible with equally plausible but competing hypotheses. Verifying a 
positive hypothesis provides a rather inconclusive test. 

Rejecting a null or negative hypothesis provides a stronger test of 
logic. Evidence that is inconsistent with a particular negative hypothesis 
provides a strong basis for its rejection. Before a court of law, a defendant 
is assumed to be not guilty until the not-guilty assumption is discredited 
or rejected. In a sense, the not-guilty assumption is comparable to the null 
hypothesis. 

If the difference between the mean achievement of the experimental « 
and the control groups is too great to attribute to the normal fluctuations 
that result from sampling error, the experimenter may reject the null 
hypothesis, saying in effect that it is probably not true that the difference 
is merely the result of sampling error. The means no longer behave as 
random sample means from the same population. Something has happened 
to, or affected, the experimental group in such a way that it behaves like 
a random sample from a different or changed population. Thus the re- 
searcher may conclude that the experimental variable or treatment prob- 
ably accounted for the difference in performance, as measured by the mean 
test scores. The experimenter is using a statistical test to discount chance 
or sampling error as an explanation for the difference. 

If the difference between means was not great enough to reject the 
null hypothesis, the researcher fails to reject it. He or she concludes that 
there was no significant difference and that chance or sampling error 
probably accounted for any observed difference. 


The Level of Significance 


The rejection or acceptance of a null hypothesis is based upon some level 
of significance (alpha level) as a criterion. In psychological and educational 
circles, the 5 percent (.05) alpha (a) level of significance is often used as a 
standard for rejection. Rejecting a null hypothesis at the .05 level indicates 
that a difference in means as large as that found between the experimental 
and control groups would have resulted from sampling error in less than 
5 out of 100 replications of the experiment. This suggests a 95 percent 
probability that the difference was due to the experimental treatment rather 
than to sampling error. 

A more rigorous test of significance is the 1 percent (.01) a level. 
Rejecting a null hypothesis at the .01 level would suggest that a difference 
in means as large as that found between the experimental and control 
groups would have resulted from sampling error in less than 1 in 100 
replications of the experiment. 

When samples are large (more than 30 in size) the ¢ critical value 
approaches the z (sigma) score. In these cases, if the z value equals or exceeds 
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1.96, we may conclude that the difference between means is significant at 
the..05 level. If the z value equals or exceeds 2.58, we may conclude that 
the difference between means is significant at the .01 level. Determining 
the exact t critical value is discussed later in this chapter. 

Using the example of the reading experiment previously described, 
let us supply the data and test the null hypothesis that there was no sig- 
nificant difference between the mean reading achievement of the initial- 
teaching alphabet experimental group and the traditional alphabet control 


group. 

EXPERIMENTAL CONTROL 
ITA GROUP TRADITIONAL ALPHABET GROUP 
N, = 32 No = 34 
X, = 87.43 X, = 82.58 
S? = 39.40 S3 = 40.80 
t= IX _ 8743 ~ 8258 

S.S 39.40 | 40.80 

N, Ne 32 " 34 

4.85 485 485 , 


Because a ! value of 3.11 exceeds 2.58, the null hypothesis may be 
rejected at the .01 level of significance. If this experiment were replicated 
with random samples from the same population, the probability is that a 
difference between mean performance as great as that observed would 
result from sampling error in fewer than 1 out of 100 replications. This 
test would indicate rather strong evidence that the treatment would prob- 
ably make a difference in the teaching of reading when applied to similar 
populations of pupils. 


DECISION MAKING 


Statistical decisions about parameters based upon evidence observed in 
samples always involve the possibility of error. Statisticians do not deal with 
decisions based upon certainty. They merely estimate the probability or 
improbability of occurrences of events. 

Rejection of a null hypothesis when it is really true is known as a Type 
Terror. The level of significance (alpha) selected determines the probability 
of a Type I error. For example, when the researcher rejects a null hy- 
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pothesis at the .05 level, he or she is taking a 5 percent risk of rejecting 
what should be a sampling error explanation when it is probably true. 

Not rejecting a null hypothesis when it is really false is known as a 
Type II error. This decision errs in accepting a sampling error mae sa 
when it is probably false. 

Setting a level of significance as high as the .01 level minimizes the 
risk of a Type I error. But this high level of significance is more conservative 
and increases the risk of a Type II error. The researcher sets the level of 
significance based upon the relative seriousness of making a Type I or a 
Type I error. G 


Two-Tailed and One-Tailed Tests of Significance 


If a null hypothesis was proposed that there was no difference (other than 
in sampling error) between the mean IQs of athletes and nonathletes, we 
would be concerned only with a difference and not with the superiority or 
inferiority of either group. 


There is no difference between the mean IQs of athletes and non- 
athletes. In this situation we apply a two-tailed test: 


If we changed the null hypothesis to indicate the superiority or in- 
feriority of either group it might be stated: 


Athletes do not have higher IQs than nonathletes. 
or 
Athletes do not have lower IQs than nonathletes. 


Each of these hypotheses indicates a direction of difference. When 
researchers are hypothesizing a direction of difference, rather than the 
mere existence of a difference, they can sometimes use a one-tailed test. 

For a large sample two-tailed test, the 5 percent area of rejection is 
divided between the upper and lower tails of the curve (2.5 percent at each 
end), and it is necessary to go out to + 1.96 on the sigma (z) scale to reach 
the area of rejection (Figure 9-3). 

For a one-tailed test, since the 5 percent area of rejection is either at 
the upper tail or at the lower tail of the curve, the ¢ critical value is lower, 
for it is not necessary to go as far out on the sigma scale to reach the area 
of rejection (Figure 9—4). The t critical value in such a case is + 1.645. 


Large Sample t Critical Values for Rejection of the Null Hypothesis 
.05 level .01 level 


Two-tailed test 1.96 2.58 
One-tailed test 1.64 2.33 
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FIGURE 9-3 A two-tailed test at the .05 level (2.5 percent at each end). 


A similar pair of curves would illustrate the difference between ¢ 
critical areas of rejection at the 1 percent level of significance. The ¢ values 
must equal or exceed these critical values for the: rejection of a null 
hypothesis. 

Because the ¢ values needed to reject a null hypothesis are smaller for 
a one-tailed test, and because most researchers would like to reject the null 
hypothesis, it is tempting always to propose a directional hypothesis so as 
to be able to use a one-tailed test. However, a one-tailed test should only 
be used when a directional hypothesis is actually proposed for logical and/ 
or theoretical reasons prior to the collection of even preliminary data. If 
a reasonable alternative hypothesis could be proposed in the opposite di- 
rection, then, even if a directional hypothesis is to be tested, a two-tailed 
test should be used. Hypotheses that athletes would have higher or lower 
IQs than nonathletes are probably inappropriate for one-tailed tests. A 


FIGURE 9-4 A one-tailed test at the .05 level (5 percent at one end or 5 percent at the other end). 


Acceptance area 
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better example of a directional hypothesis, for which a.one-tailed test would 
be appropriate, would be: 


Children will score higher on a reading achievement test after first 
grade than they did prior to first grade. 


In this case, while no difference might be found, it is very unlikely that 
findings would be in the opposite direction, reading being lower after first 
grade. 

The test of the significance of the difference between two independent 
means to this point has concerned large samples, ‘and the critical t values 
for rejection of the null hypothesis have been found in the normal prob- 
ability table. 

When small samples are used to infer population differences, a dif- 
ferent set of ¢ critical values is used: But before discussing small sample 
tests, an important concept known as degrees of freedom should be con- 
sidered. 


Degrees of Freedom 


The number of degrees of freedom in a distribution is the number of 
observations or values that are independent of each other, that cannot be 
deduced from each other. Although this concept has been puzzling to 
students of statistics, several analogies and their application to estimation 
or prediction may help to clarify it. 


1. Let us assume that a coin is tossed in the air. The statistician predicts 
that a head will turn up. If a head comes up, he or she has made one 
correct, independent prediction. But if the statistician predicted that 
a head would turn up and a tail would face down, he or she has made 
two predictions. Only one prediction, however, is an independent 
prediction, for the other can be deduced from the first. The second 
added no new information. In this case there was one degree of 
freedom, not two. 

The strength of a prediction is increased as the number of in- 
dependent observations or degrees of freedom is increased. 

2. When a mean is computed from a number of terms in a distribution, 
the sum is calculated and divided by N. 


X= 
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But in computing a mean, 1 degree of freedom is used up or lost, 
and subsequent calculations of the variance and the standard deviation 


will be based on N — 1 independent observations or N — 1 degrees 


of freedom. An example of the loss of a degree of freedom follows. 


A B 
ORIGINAL ALTERED 
DISTRIBUTION DISTRIBUTION . 2 
to gni hati iiS These four terms cán be 
*4 8 altered in any way. 
+3 : 5 
+2 : 7 
icai luc] 23d ove 200th 
=X = +15 EX = +15 This term is dependent on, or 
N-5 N=5 determined by, the other four 
X= +9 X= +3 © terms. 


In the altered distribution, the fifth term must have a value of — 20 
for the sum to equal + 15, the mean to be +3, and the sum of the deviations 
from the mean to equal zero. Thus, four terms are independent and can 
be altered, but one is dependent or fixed and is deduced from the other 
four. There are N — 1 (5 — 1) or 4 degrees of freedom. 


Standard deviation for samples (S). In Chapter 8 we described the var- 
iance and standard deviation for a population. Because most of the time 
we use samples selected from the population, it is necessary to introduce 
the formulas for the variance 5? and the standard deviation (S) of a sample. 
The sample formulas differ only slightly from the population formulas. 
As we will see, instead of dividing by N in the deviation formula and by 
N? in the raw score formula, the sample formulas divide byn — land n(n 
~ 1), respectively.! This is done to correct for the probability that the 
smaller the sample, the less likely it is that extreme scores will be included. 
Thus the formula for o, if used with a sample, would underestimate the 
standard deviation of the population. Dividing byn — 1 or n(n — 1) corrects 
for this bias more or less depending upon the sample's size. This makes 
the standard deviation of the sample more representative of the population. 
In a small sample, say n — 5, the correction is rather large, dividing by 4 
instead of 5—a reduction of 20 percent in the denominator. In a large 
sample, say n = 100, the correction is insignificant, dividing by 99 instead 
of 100—a reduction of 1 percent in the denominator. 


1N represents the number of subjects in the population; n represents the number of subjects 
in a sample. 
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The two formulas for sample standard deviation with the deviation 
and the raw score methods of computation, respectively, are: 


-xe [sx 
Sg mot i Sa 
and 
[Nzx? - (X? 

S= n(n — 1) 


No doubt the reader can see that the only changes are in the denominator. 
Thus, if we substitute n(n — 1) for N? and calculate $? and S using the 
data from Chapter 8, we would find the following: 


_ 9(52,125) — (675)? 469,125 — 455,625 


" 9(8) 72 
13,500 _ 

S? = = = 187.50 

S = V187.50 = 13.69 


These results are quite a change from o? = 166.67 (change of + 20.83) 
and g, = 12.91 (change of +0.78). These relatively large differences from 
the population formula to the sample formula are due to the small sample 
size (n = 9), which made a relatively large correction necessary. The cor- 
rection for calculating the variance and standard deviation is important 
because unless the loss of a degree of freedom is considered, the calculated 
sample variance or standard deviation is likely to underestimate the pop- 
ulation variance or standard deviation. This is true because the mean of 
the squared deviations from the mean of any distribution is the smallest 
possible value, and probably would be smaller than the mean of the squared 
deviation from any other point in the distribution. Because the mean of 
the sample is not likely to be identical to the population mean (because of 
sampling error), the use of the number of degrees of freedom, rather than 
N in the denominator, tends to correct for this underestimation of the 
population variance or standard deviation. 

The strength of a prediction or the accuracy of an inferred value 
increases as the number of independent observations (sample size) is in- 
creased. Because large samples may be biased, sample size is not the only 
important determinant, but if unbiased samples are selected randomly from 
a population, large samples will provide a more accurate basis than will 
smaller samples for inferring population values. 
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When small samples are involved, the t table is used to determine statistical 
significance, rather than the normal probability table. This concept of small 
sample size was developed around 1915 by William Sealy Gosset, a con- 
sulting statistician for Guinness Breweries of Dublin, Ireland. Because his 
employer's rules prohibited publication under-the researcher's name, he 
signed the name “Student” when he published his findings. 

Gosset determined that the distribution curves of small sample means 
were somewhat different from the normal curve. Small sam ple distributions 
were observed to be lower at the means and higher at the tails or ends of 
the distributions. 

Gosset’s ¢ critical values, carefully calculated for small samples, are 
reproduced in the ¢ distribution table in Appendix D. The ¢ critical values 
necessary for rejection of a null hypothesis are higher for small samples at 
a given level of significance (see Figure 9-5). Each ¢ critical value for 
rejection is based upon the appropriate number of degrees of freedora. 

As the sample sizes increase, the ¢ critical values necessary for rejection 
of a null hypothesis diminish and approach the z values of the normal 
probability table. 


Significance of the Difference between Two Small 
Sample Independent Means 


When the samples are small and their variances are equal or nearly equal, 
the method of pooled variances provides the appropriate test of the sig- 
nificance of the difference between two independent means. 

The formula is a bit more involved than the one previously illustrated, 
but it provides a more precise test of significance. The appropriate t critical 
value for rejection of the null hypothesis would be found: for N + N — 2 
degrees of freedom, using the ¢ distribution table. 


hes X, -X, 
Mi T 
P (N; — 1)S? + (Np — 1)S2 LES, 
Ac Jar Fa CL Tanrn ar 
N,+N,-2 | NM. NS 


For example, in comparing the significance of the mean IQ difference 
between samples of 8 athletes and 10 nonathletes, the number of degrees 
of freedom would beN + N — 2 2 8 + 10 — 2 = 16. From the t distribution 
table at 16 degrees of freedom, the ¢ critical values necessary for the re- 

_ Jection of the null hypothesis would be: E 
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FIGURE 9-5 Distribution of large and small sample means. 


i LEVEL OF SIGNIFICANCE 
16 DEGREES OF FREEDOM .05 “01 
Two-tailed test 2.120 2:921 


One-tailed test 1.746 92.588 


HOMOGENEITY OF VARIANCES 


In t tests for small samples, one condition must be met to justify the method 
of pool variances. This condition is known as equality or homogeneity of 
variance. It does not literally mean that the variances of the samples to be 
compared must be identical, but only that they do not differ by an amount 
that is statistically significant. Differences that would be attributed 1o sam- 
pling error do not impair the validity of the process. 

To determine whether the samples meet the criterion of equality of 
variances an F,,,, test is used. 


_ S (largestvariance) 
»S?(smallestvariance)  : 


This F ratio is never less than one, for the largest variance is always 
divided by the smallest. To test for homogeneity of variance, an F distri- 
bution table is used in much the same way as the ¢ distribution table. F 
critical values are presented for determining the statistical significance of 
the calculated F critical ratio, based upon the appropriate rows and col- 
umns, each at N — 1 degrees of freedom. dns 

A few .05-level-of-significance values from the Fmax distribution table 
are presented in Table 9— 1. The degrees of freedom for the largest group 
is used if the samples differ in size. With ¢ tests, there will only be two 
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TABLE 9-1 Distribution of F (.05 level) 


NUMBER OF VARIANCES 
2 3 4 
Degrees of g 4.03 5.34 6.31 
freedom for 10 3.72 4.85 5.67 
largest group 12 4.16 4.79 
(N - 1) 15 2.86 3.54 4.01 


variances. For Analysis of Variance (discussed later in this chapter), there 
usually will be more than two variances. 

Unless the calculated F equals or exceeds the appropriate F critical 
value, it may be assumed that the variances are homogeneous and the 
difference is not significant. 

For example, if two samples with 10 degrees of freedom (greater 
variance 38.40) and 12 degrees of freedom (smaller variance 18.06) were 
subjected to the test of homogeneity: 


An F critical value of 3.28 must be equaled or exceeded to determine 
that the difference between variances is significant at the .05 level. In this 
example, since 2.13 < 3.28, the researcher would conclude that the vari- 
ances fulfilled the condition of homogeneity and that the method of pooled 
variances is appropriate. An example using small samples illustrates the 
process of calculating the F ratio to test homogeneity of variance and then 
calculating the appropriate ¢ test. 

The mean score of 10 delinquent boys on a personal adjustment 
inventory was compared with the mean score of 12 nondelinquent boys, 
both groups selected at random. Test the null hypothesis that there is no 
Statistically significant difference between the mean test scores at the .01 
level of significance. 


DELINQUENT BOYS NONDELINQUENT BOYS 
X,29 X, = 14 

S2 — 90.44 S? = 19.60 

N, = 10 N, = 12 
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.44 


F= 19.60 = 1.04 (the variances are homogeneous) 
df-10-12—-2-20 
ie X - X, 
MOI et d 
N, * N,-2 N NS 
m 14-9 
11(19.60) + 9(20.44) (1. 1 
12+10-2 12° 10 
SE NUS inoue T Mle NES 
215.60 + LM 
20 60 
Ew poco d 
V19.98 (2) v3.66 
5 
t-1g 729 


Because this is a two-tailed test, the ¢ critical value for rejection of the null 
hypothesis at the .01 level of significance for 20 degrees of freedom is 
2.845. 

Because the calculated value is 2.62, it does not equal or exceed the 
t critical value necessary for rejection of the null hypothesis at the .01 level 
for 20 degrees of freedom; the hypothesis is not rejected, and we conclude 
that there is no significant difference. 

Had we used the .05 level of significance for 20 degrees of freedom, 
the ¢ critical value necessary for rejection would be 2.086, and we could 
have rejected the null hypothesis, for our calculated t critical ratio of 2.62 
exceeds the 2.086 ¢ table value. 

By using the data from a previous example, comparing reading 
achievement of a group using the ITA reading method with that of the 
control group, we can see that this formula gives us the same result as did 
the formula used in that example: 


ITA CONTROL 
N, = 32 Np = 34 
X, =87.43  X,- 82.58 
S?=39.40 S3 = 40.80 


Inferential Data Analysis 


40.80 


F= 8940 ^ 1.04 (the variances are homogeneous) 
of = 32 + 34-2 = 64 
T X 2X 
(N, = 1)S3 + (No — DS (tt) 
N, + Na -2 NON 
im 87.43 — 82.58 
31 (39.4) + 33 (40.8) ( Steyr st ) 
64 32 ^34 
4.85 


S eS CMM NUIT iacit 2I nidi Mq 


1221.4 + 1346.4 / 34 , 32 
64 1088 ' 1088 
Ks 4.85 
66 
40.12 5) 
t= 485 _ 485 
» V2843 15 
t = 3.11 


Thus the two formulas presented in this chapter for comparing the 
means of two independent samples are actually equivalent. 


Significance of the Difference between the Means 
of Two Matched or Correlated Groups 
(Nonindependent Samples) 


The two previous examples of testing the significance of the difference 
between two independent means assume that the individuals were ran- 
domly assigned to the control and experimental groups. There are situa- 
tions when it is appropriate to determine the significance of the difference 
between means of groups that are not randomly assigned. Two such sit- 
uations are: 


1. When the pairs of individuals who make up the groups have been 
matched on one or more characteristics—IQ, reading achievement, 
identical twins, or on some other basis for equating the individuals. 

2. When the same group of individuals takes a pretest, is exposed to a 
treatment, and then is retested to determine whether the influence 
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of the treatment has been statistically significant, as determined by 
mean gain scores. 


Because the groups are not independent samples, it is necessary to 
calculate the coefficient of correlation between: 


l. the posttest scores of the matched pairs sample; or 
2. the pretest and posttest scores of the participants in the experiment. 


Using the coefficient of correlation, the appropriate ¢ test would be 
based upon this formula: 


ENIMS. y cor 
Si, SB (S) (Sz) 
N, N: VN,/ \VN2 


The number of degrees of freedom would be the number of pairs 
minus one. Two examples illustrate situations A and B: 


Example A. Two groups, each made up of 20 fifth-grade students, 
were matched on the basis of IQs. Filmstrips were used to teach the ex- 
perimental group; the control group was exposed to a conventional "read 
and discuss" method. 

The researcher wished to test the null hypothesis that there was no 
difference between the mean achievement of the two groups (a two-tailed 
test) at the .05 level. 


` 


x [^ 
N, = 20 No = 20 
SÌ = 54.76 S3 = 42.25 
X, = 53.20 X, = 49.80 
r= 4.60 df = 19 
F= “e. 1.30 (variances are homogeneous) 
42:25 E N $ 


17874234 —120(1:66)(1:45) 
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‘= AB 140^ 29 


Because the ¢ value of 2.43 exceeds the t critical value of 2.093 for a 


two-tailed test at the .05 level at 19 degrees of freedom, the null hypothesis 
may be rejected. 


of his class of 30 students. He administered a timed speed/accuracy test 
and recorded the score for each student. The next day, after 10 minutes 
of class participation in a TM exercise, he administered a similar timed 
speed/accuracy test. 

He computed the mean scores for the pretest and the scores obtained 
after the TM experience and calculated the coefficient of correlation be- 
tween the pairs of scores to be +.84. 

He then tested the null hypothesis that the TM experience would not 
improve the proficiency in speed and accuracy of typing of his class. He 
chose the .01 level of significance, using a one-tailed test. 


PRETEST TEST AFTER 
TM 


Ns = 30 N, = 30 
$3 = 37.21 S$? = 36.10 
X, = 4480 X, = 4910 


r= +84 df= 99 
1 ; 
= 8610 ^ 1.08 (variances are homogeneous) 


P5 49.10 — 44.80 


3721 3610 6.10) /6.01 
Sor o e(l (835) (22) 
o 4.30 
V1.24 + 1.20 — 1.681.171 1.11)(1:10) 
re 4.30 _ 4.30 
v2.44 - 2.05 1/30 
4.30 
t=- = 694 
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Because the t value of 6.94 exceeds the t critical value of 2.462 for a 
one-tailed test at the .01 level for 29 degrees of freedom, he rejected the 
null hypothesis, concluding that the meditation experience did improve 
performance proficiency. 


Statistical Significance of a Coefficient of Correlation 


Throughout this chapter on inferential data analysis, the idea of statistical 
significance and its relationship to the null hypothesis have been empha- 
sized. An observed coefficient of correlation may result from chance or 
sampling error, and a test to determine its statistical significance is appro- 
priate. In small sample correlations, chance could yield what might appear 
to be evidence of a genuine relationship. 

The null hypothesis (Ho) states that the coefficient of correlation is 
zero. Only when chance or sampling error has been discredited on a prob- 
ability basis can a coefficient of correlation be accepted as statistically sig- 
nificant. One test of the significance of r is determined by the use of the 
formula: 


; rVN-?2 
MP 
With N — 2 degrees of freedom, a coefficient of correlation is judged 


as statistically significant when the ¢ value equals or exceeds the ! critical 
value in the ¢ distribution table. 


Ifr = .40 

N = 25 
40V23 2192 599 
MASSA 78277 


Using a two-tailed test at the .05 level with 23 degrees of freedom, 
the null hypothesis is rejected, exceeding the ¢ critical value of 2.07. As 
sample size is decreased, the probability of sampling error increases. For 
a smaller sample, the coefficient must be larger to be statistically significant. 


lfr 
N 


ol 
ey 
o 


At 16 degrees of freedom the observed value of 1.74 fails to equal or 
exceed the ¢ critical value of 2.12 at the .05 level of significance, and the 
null hypothesis would not be rejected. Thus, with a sample N of 18, a 
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coefficient of correlation of .40 would not be large enough for the rejection 
of the null hypothesis, a sampling error explanation. 

There is a more direct and simple way to evaluate the statistical sig- 
nificance of the coefficient of correlation. Instead of computing the ¢ value 
and using the table in Appendix D, critical values of r can be read directly 
from the table in Appendix C at the .10, .05, .02, and .01 levels. 

Statistical significance merely indicates the probable influence of chance 
or sampling error upon an observed coefficient of correlation between 


conducting small sample studies, but of less importance in large sample 
research. 

` The reader should remember from Chapter 8 that the interpretation 
of a correlation involves more than just its statistical significance. A low 
correlation, for example r = .20, is statistically significant if a large sample, 
200 subjects, was used. Despite its statistical significance, this correlation, 
:20, is still a low correlation with the two variables only having .04 of their 
Variance in common. 

The values presented in the table in Chapter 8 provide guidelines for 

evaluating the magnitude of 7, but they should be interpreted cautiously 
in terms of several criteria: 


l. The magnitude and statistical significance of the coefficient of cor- 
relation 


2. The nature of the variables 

3. The design of the study 

4. The reported findings of other respected investigators in the field of 
inquiry 


ANALYSIS OF VARIANCE (ANOVA) 


We have noted that the ¢ test is employed to determine, after treatment, 
whether the means of two random samples were too different to attribute 
to chance or sampling error. The analysis of variance is an effective way 
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to determine whether the means of more than two samples are too different 
to attribute to sampling error. 

It would be possible to use a number of ¢ tests to determine the 
significance of the difference between five means, two at a time, but it 
would involve ten separate tests. The number of necessary pair-wise com- 
parisons of N things is determined by the formula: 


N(N = 1) 


IfN = 


Analysis of variance makes it possible to determine whether the five 
means differ significantly with a single test, rather than ten. Another ad- 
vantage lies in the fact that computing a number of separate ¢ tests will 
increase the overall Type I error rate for the experiment. For instance, if 
we calculated ten ¢ tests (for comparing five means) and accepted .05 as 
our significance level, we would have ten times .05, or .50, as the probability 
that we would reject at least one null hypothesis when it is really true (Type 
I error). Thus we would have an unacceptably high error rate for the total 
experiment. Analysis of variance takes care of this by comparing all five 
means simultaneously in a single test. 

In single classification, or one-way analysis of variance, the relationship 
between one independent and one dependent variable is examined. For example: 


A test of abstract reasoning is administered to three randomly selected groups 
of students in a large state university majoring in mathematics, philosophy, 
and chemistry. Are the mean test scores of each of the three groups signif- 
icantly different from one another? 


t 


The analysis of variance consists of these operations: 


1. The variance of the scores for three groups are combined into one 
composite group, known as the total groups variance (V,). 


9. The mean value of the variances of each of the three groups, com- 
puted separately, is known as the within-groups variance (V,,). 


3. The difference between the total groups variance and the we 


groups variance is known as the between-groups variance (V, — V,, = 
Vj 
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4. The F ratio is computed 


V, _ between-groups variance 


Fan V, within-groups variance 

The logic of the F ratio is as follows. The within-groups variance 
represents the sampling error in the distributions and is also referred to 
as the error variance or residual. The between-groups variance represents 
the influence of the variable of interest or the experimental variable. If 
the between-groups variance is not substantially greater than the within-groups 
variance, the researcher would conclude that the difference between the 
means is probably only a reflection of sampling error. If the F ratio were 
substantially greater than one, it would seem that the ratio of the between- 
groups variance and the within-groups variance was probably too great to 
attribute to sampling error, 

The critical values of the F ratio (named for Sir Ronald Fisher) are 
found in an F table (different than the F nax table referred to earlier), which 
indicates the critical values necessary to test the null hypothesis at selected 
levels of significance. 

As can be seen in the F table presented in Appendix F, there are two 
different degrees of freedom, one for V, (the numerator) and one for V; 
(the denominator), The degrees of freedom for the within-groups variance 
(V) is determined in the same way as it is for the t test —that is, the sum 
of the subjects for all of the groups minus the number of groups. We can 
use K to represent the number of groups and N, + No +--+ — Kto 
represent the degrees of freedom for the within-groups variance. In the 
above example, if we had ten students in each of the three groups, we 
would have 10 + 10 + 10 — 3, or 27, degrees of freedom for the within- 
groups variance. The degrees of freedom for the between-groups variance 
(V,) is determined by the number of groups minus one (K — 1). In the 
above example, there are three groups, thus, two degrees of freedom. The 
above example then has two degrees of freedom for the numerator and 
27 for the denominator, for a total of 29, one less than the total number 
of subjects. 

The calculation of F involves finding the mean of the deviations from 
the mean, squared. Thus the between-groups variance (Vj) is more com- 
monly referred to as the mean Squared between (MS,), and the within- 
groups variance (V,,) is more commonly referred to as the mean squared 
within (MS, ). The formula then becomes: 


Given the data in Table 9-2, we would calculate F as follows. The 
first step is to find the sum of the squared deviations of each person's score 
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TABLE 9-2 Sample Data for Calculating Analysis of Variance 


GROUP 1 

MATHEMATICS GROUP 2 GROUP 3 

MAJORS PHILOSOPHY MAJORS CHEMISTRY MAJORS 
X x? PA x3 X, X3 
18 324 26 676 18 324 
22 484 27 729 14 196 
18 324 18 324 15 225 
23 529 22 484 14 196 
19 361 23 529 19 361 
24 576 19 361 21 441 
20 400 27 729 17 289 
21 441 26 676 17 289 
19 361 24 576 18 324 
25 625 26 876 19 361 

=X, = 209 IX = 4425  XX,- 288  XX2- 5760 XX,- 172 Sx? = 3006 

X, = 209 X, = 23.8 X. = 17.2 


X = 20.63 Sx? = XX? «XX? + XX = 13,191 
IRI MERI rs Uc Gr Sa saniem CO ee PA E A 


for the mean of all of the subjects. This is known as the total sum of squares 
(SS,) and can be found by using the following formula: 


(=X)? 
SS, = xx? - um 
In our example this would be: 


(619)? 
30 


SS, = 13,191 — = 13,191 — 12,772.03 = 418.97 


The next step is to divide the total'sum of squares into the between- 
groups sum of squares (SS,) and the within-groups sum of squares (SS,). 
We determine SS, using the formula: 


2 2 2 
Spi E. p E oca 2 GA 
n No N 


n — the number of subjects in a group 
N = the number of subjects for all the groups combined 


In our example this would be 
- (209)? (238)? | (172? _ (619)? 
PR UNO ade NOR ad 
SS, = 4368.1 + 5664.4 + 2958.4 — 12,772.03 
SS, = 218.87 i 4 
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The within-groups sum of squares (SS,,) can be calculated in two ways. 
First, we can calculate it directly using the formula: 


2 2 Xx» 
ss, = xq PME us GAS us (OX 
Th N2 n, 


In our example this would be 


SS, = 4425 — 4368.1 + 5760 — 5664.4 + 3006 — 2958.4 
SS, — 200.1 


SS, can also be found by subtracting SS, from SS,: 


SS, = SS, — SS, 
SS, = 418.97 — 218.87 = 200.1 


By using both methods of calculating 5$,, we can check our results for 
computational errors. 
To find the mean Square between (MS,) and mean square within 
^ (MS$,), we divide the sum of squares between (SS,) and the sum of squares 
within (SS, ) by their respective degrees of freedom (df). 


MS, _ SS,/df, 
MS, SS,/df, 
MS,  218.87/2 
MS,  200.1/27 


= = 14.77 


Table 9-3 shows what a typical summary table for this analysis of 
variance would look like. The F of 14.77 is statistically significant at the 
:01 level. That is, there is less than 1 chance in 100 that the observed 
differences among these three group means is due to sampling error. We 
can reject the null hypothesis with this degree of confidence. 

However, this significant F does not pinpoint exactly where the dif- 
ferences are in a pair-wise way. Thatis, the three groups differ significantly, 
but does Group 1 differ from Group 2 and/or Group 3? Does Group 2 
differ from Group 3? These questions can be answered by still further 
analysis of the data using one of the several post hoc analyses available (e.g., 
Scheffe, Tukey, Neuman-Keuls, Duncan). The reader should consult one 
of several fine texts (e.g., Glass & Hopkins, 1984; Kirk, 1982; Winer, 1971) 
for more information regarding the use and calculation of these post hoc 
tests. 
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TABLE 9-3 Summary of Three Group Analysis of Variance 


SOURCE OF 
VARIANCE SS df MS F 
—————— iure ice yr 


Between groups (major) 218.87 2 109.44 14.77* 


Within groups (error) 200.10 27 7.41 

Total n 418.97 

EU LL: EE SU ORC req i P et se DG 
"p < .01 


In multiple classification or factorial analysis of variance, both the independent 
and interactive effects of two or more independent variables on one dependent 
variable may be analyzed. Not only may the effect of several independent 
variables be tested, but their interaction (how they may combine in a sig- 
nificant way) may be examined. Because human behavior and the factors 
influencing it are complex and can rarely be explained by single inde- 
pendent variable influences, this method of analysis is a powerful statistical 
tool of the behavioral researcher. 

With computers so readily available, it is rarely necessary to calculate 
a factorial analysis of variance by hand. An example of a computer printout 
from such an analysis is included in Chapter 10. — . 

. In factorial designs, the total variance is divided into more than two 
parts. Itis divided into one part for each independent variable (main effect), 
one part for each interaction of two or more independent variables, and 
one part for the residual, or within-group, variance. Thus, in a design with 
two independent variables, the variance is divided into four parts. For 
example, in our previous example comparing the performance of math- 
ematics, philosophy, and chemistry majors on a test of abstract reasoning, 
we could also divide each of the three groups into males and females. We 
then have a factorial design with two independent variables, student’s major 
and sex. Because there are three conditions of student major and two 
conditions of sex, this is a 3 x 2 design. As shown in Chapter 10, this 
results in the variance being divided into four parts: the main effect of 
student's major, the main effect of student's sex, the interaction effect of 
student's major with sex, and residual. From this, three separate Fs are 
derived: one to test the difference among the three majors, one to test the 
difference between males and: females, and one to test the interaction of 
students' sex and major. An example of a significant interaction was pre- 
sented in Chapter 5 (see Figure 5—3). 

With the aid of computers, analysis of variance can be used with any 
nuiber of independent variables. The only limitations are in controlling 
for potentially confounding variables and interpreting complex interaction 
effects. Glass and Hopkins (1984), Kirk (1982), and Winer (1971) are ex- 
cellent resources for the student wanting more information about analysis 
of variance designs and computation. 
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ANALYSIS OF COVARIANCE (ANCOVA) 
AND PARTIAL CORRELATION 


Analysis of covariance and partial correlation are statistical techniques that 
can remove the effect of a confounding variable's influence from a study. 
Partial correlation is used to remove the effect of one variable on the cor- 
relation between two other variables. For example, if a correlation is desired 
between IQ and academic achievement and 'the subjects have a range of 
ages, we would not want the variable of chronological age to affect the 
correlation. Thus, we would partial out its effect on the other two variables, 
IQ and academic achievement. This is symbolized by 7, ,, the correlation 
of variables 1 and 2 with 3 removed. We can calculate the partial correlation 
using the following formula: 


N hyo — (is) (roa) 


hog = fiz Ce) c 
me M1 — 7) - r5) 


An example from Glass and Hopkins (1984) may help to further 
clarify this concept. In this example a correlation between visual perception 
(X,) and reading performance (Xs) is found to be .64 for children ranging 
in age (X) from six to fifteen years. Because of the wide age range, and 
because children's reading and visual perception both generally improve 
with age, it seems appropriate to partial out the effect of age. Given the 
following correlation coefficients, we can calculate the partial correlation 
of visual perception and reading performance with age removed. 


132 (correlation of visual perception (X) 

and reading performance (X;) = .64 
s [correlation of X, and age (X;)] = .80 
725 (correlation of X; and Xs) = .80 


64 — (80)(80 64 — 64 


"es UG — 893 ( — mone caro ad 


As Glass and Hopkins (1984) point out: 


-.. one would estimate the value of Rj, for children of the same chronological 
age to be zero. If enough children of the same chronological age were avail- 
able, r could be calculated for them alone to check the previous result. The 
partial correlation coefficient serves the purpose of estimating r,s for a single 
level of chronological age even when there is an insufficient number of per- 
sons at single chronological age to do the estimating by direct calculation. 
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Analysis of covariance (ANCOVA) uses the principles of partial cor- 
relation with analysis of variance. It is particularly appropriate when the 
subjects in two or more groups are found to differ on a pretest or other 
initial variable. In this case, the effects of the pretest and/or other relevant 
variables are partialed out and the resulting adjusted means of the posttest 
scores are compared. Analysis of covariance is a method of analysis that 
enables the researcher to equate the preexperimental status of the groups 
in terms of relevant known variables. The initial status of the groups may 
be determined by pretest scores in a pretest-posttest study, or in posttest- 
only studies, by such measures as intelligence, reading scores, grade-point 
average, or previous knowledge of subject matter. Differences in the initial 
status of the groups can be removed statistically so that they can be com- 
pared as though their initial status had been equated. The scores that have 
been corrected by this procedure are known as residuals, for they are what 
remain after the inequalities have been removed. 

Analysis of covariance, used with one or more independent variables 
and one dependent variable, is an important method of analyzing exper- 
iments carried on under conditions that otherwise would be unacceptable. 
'The mathematical procedures are rather complicated, and there are many 
steps in computing their values. However, with the use of standard com- 
puter programs, the analysis of complex studies can be processed almost 
instantaneously. 

It should be noted that analysis of covariance is not as robust as analysis 
of variance. That is, violation of the assumptions on which analysis of 
covariance is based may make its use inappropriate. In addition, as Glass 
and Hopkins (1984) point out, ANCOVA does not transform a quasi- 
experiment into a true (randomized) experiment. There is no substitute 
for randomization. 


MULTIPLE REGRESSION AND CORRELATION 


In Chapter 8, we discussed correlation and linear regression when only two 
variables are involved. We demonstrated the prediction (regression) equation 
for estimating the value of one variable from another: Y = a + bX. 

In many cases, it is better to use more than one predictor variable to 
predict an outcome, or dependent, variable. For example, a university may 

. use a number of variables to predict college GPA in its admission process. 
High school grades and SAT or ACT scores are usually used. Ranks in 
high school graduating class could also be included. — — 

Multiple regression is the term used for predicting Y (in the example 
above, college GPA) from two or more independent variables combined. 
The formula for multiple regression is just an extension of that for linear 
regression: 
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Y¥=a+b,X,+bX, +... 


Y = the variable to be predicted 
a = the constant or intercept 
b, — the slope of the first predictor 
b; = the slope of the second predictor 
X, — the score on the first predictor 
X2 = the score on the Second predictor 


An example may further clarify this procedure. In Chapter 8 we gave 
an example of predicting college GPA from high school GPA. We will use 
the data given in that example and add SAT score (combined verbal and 


quantitative). 
Y = college GPA 
X, = high school GPA 
X2 = SAT score 


"2 = correlation of high school GPA and SAT score 
^y, = Correlation of college GPA and high school GPA 
ly, = Correlation of college GPA and SAT score 


The data are as follows: 


Y= 240 

S,- 0.50 

X=! 2,10 

Sy = 0.60 

Xə = 930.00 

Syo = 80.00 

f2 =. 022 

fy = 052 

Ji : à fya = 0.66 


The first step in finding b is to calculate the standardized beta weight 
(B): 


8, = = Ua) (na) E Ge = (yr) (ha) 
y TERES f LA 


i (4.52 = (66) (22) -375 
Tii Se t= (232 ^ 950.7 39 


= :86 = (.52) (22) _ 546 - 
rer 1 = (22)? ~ ggo - 974 
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Using the standardized beta weights and the standard deviations, we 
can calculate the raw score beta weights (5): 


jos CHO G) 
b, = B Sa b; = Bo Su 
.50 
b, = .394 (3) = .328 
50 
ba = .574 (2) = .004 


The next step is to calculate the intercept, a, from the formula: 
a = Y — b,X, — bX. - ... 
In this case: 


a = 2.4 — .328 (2.1) — .004 (930) 
a = 2.4 — 689 — 3.72 = -2.01 


Finally, we can calculate the predicted college GPA for the two stu- 
dents in the example from Chapter 8: 


X4, = (student A's high school GPA) = 2.00 
X,2 = (student A’s SAT score) = 900.00 
Xp, = (student B's high school GPA) = 3.10 
Xy; = (student B's SAT score) = 1100.00 


a + bX, + bzXz... 

—2.01 + .328 (2.00) + .004 (900) 
—2.01 + .656 + 3.6 = 2.25 
—2.01 + .328 (3.10) + .004 (1100) 
= —201 + 1.02 + 4.4 = 3.41 


esee 
li 


Student A, with the below-average high school GPA and SAT score, 
is predicted to have a below-average college GPA, and student B, with the 
above-average high school GPA and SAT score; is expected to have an 
above-average college GPA. When we compare these findings with the 
regression results in Chapter 8, we see that the addition of a confirming 
score resulted in a prediction further above or below the average. That is, 
when we included an SAT score for student A that was to the same side 
of (below) the means as his or her high school GPA, the result was a 
predicted college GPA further below the average than the one predicted 
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by high school GPA alone (2.25 versus 2.36). Conversely, when we included 
an SAT score for student B that was above the mean, as was his or her 
high school GPA, the result was a predicted college GPA further above the 
average than the one predicted by high school GPA alone (3.41 versus 
2.83). 

If we had added a disconfirming SAT score to high school GPA, it 
would have the opposite effect described above. For example, if student 


Y, = -2.01 + .328 (2.00) + .004 (1080) 
7, = -201 + .656 + 4.32 = 2.97 


In either case, confirming the addition of related variables to a prediction 
equation will result in more accurate predictions. 

We should point out that when we write of “adding” a variable to the 
equation, we do not mean that the amount of the prediction from one 
variable is just added to by the amount of prediction from a second, and/ 


overlap (relatedness) of the predictor variables, The result is that the second 
variable only “adds” the amount of prediction that it has independent of 
the first variable. This is necessary since other attributes (variables) may 
directly influence more than one of the predictor variables. For instance, 


the percent of the variance of the predicted variable that is due to factors 
other than the predictor variables. D 

In the previous multiple regression example, the multiple correlation 
(the correlation between actual college GPA and Predicted college GPA) is 
R = 583: The percentof college GPA that is due to a combination of high 
school GPA and SAT scores is then R? = .340, Thirty-four percent of the 
variance of college GPA is explained by high school grades and SAT scores. 
Because 1 + R? js -660, 66 percent of college/GPA is due to other factors 
such as motivation, involvement in extracurricular activities, measurement 


dc i Lig 
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error, and so on. This gives a good idea of just how accurate we can expect 
individual predictions to be. Given this data, we would expect to be able 
to predict broad categories such as “probable fail,” “borderline,” “probable 
pass,” and probable high GPA. To expect to accurately predict a person’s 
college GPA would be unrealistic. 

For greater detail, the reader should consult a more advanced text 
that specializes on this topic (e.g., Cohen & Cohen, 1975; Neter, Wasser- 
man, & Kutner, 1985). A computer analysis of a more intricate multiple 
regression is presented in Chapter 10. 


NONPARAMETRIC TESTS 


The parametric tests presented in this chapter are generally quite robust; 
that is, they are useful even when some of their mathematical assumptions 
are violated. However, sometimes it is necessary, or preferable, to use a 
nonparametric or distribution free test. 

Nonparametric tests are appropriate when 


1. The nature of the population distribution from which samples are 
drawn is not known to be normal. 

2. The variables are expressed in nominal form (classified in categories 
and represented by frequency counts). 

3. The variables are expressed in ordinal form (ranked in order, ex- 
pressed as first, second, third, etc.). 


Nonparametric tests, because they are based upon counted or ranked 
data rather than on measured values, are less precise, have less power than 
parametric tests, and are not as likely to reject a null hypothesis when it is 
false. 

Many statisticians suggest that parametric tests be used, if possible, 
and that nonparametric tests be used only when parametric assumptions 
cannot be met. Others argue that nonparametric tests have greater merit 
than is often attributed to them because their validity is not based upon 
assumptions about the nature of the population distribution, assumptions 
that are so frequently ignored or violated by researchers employing par- 
ametric tests. 

Of the many nonparametric tests, two of the most frequently used 
are described and illustrated here: the Chi square (x?) test, and the Mann- 
Whitney Test. 


The Chi Square Test (x) 
The x? test applies only to discrete data, counted rather than measured 
values. It is a test of independence, the idea that one variable is not affected 
by, or related to, another variable. The x? is not a measure of the degree 
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of relationship. It is merely used to estimate the likelihood that some factor 
other than chance (sampling error) accounts for the apparent relationship, 
Because the null hypothesis states that there is no relationship (the variables 


are independent), the test merely evaluates the probability that the observed 


value doesn’t necessarily in- 
dicate a cause-and-effect relationship, a limitation that was observed when 
interpreting a coefficient of correlation. A significant x? finding indicates 
that the variables probably do not exhibit the quality of independence, that 
they tend to be systematically related, and that the relationship transcends 


There are situations when the theoretical or expected frequencies 
must be computed from the distribution. Let us assume that 200 residents 
of a college dormitory major in business, liberal arts, or engineering. Is 
the variable, major, related to the number of cigarettes smoked per day 
on the average for a 3-week period? The null hypothesis would state that 
major is not related to the number of cigarettes smoked; that the variables 
major and frequency of cigarette smoking are independent. 

Chi square observations should be organized in crossbreak form. In 
each category, the expected frequencies ( fe), as contrasted to the observed 
frequencies (fo), is the number of cases that would appear if there were 
no systematic relationship between the variables, a pure chance relation- 
ship. 


Number of Cigarettes Smoked per Day 


MAJOR NONE 1-15  MORETHAN15 TOTAL 
Business 6(12) 60(56)  14(12) a 
Liberal Arts — 14(12) 48(56) — g(12) 80* 
Engineering 10(6)  32(28) (e) 40* 
Total 30** 140" — 90" 200 
grand 


total 
fe "Efrow *'X f column 


fe 
numbers represent the actual observed frequencies f, 
numbers in parentheses represent the expected frequencies f. 


The expected frequency for each of the 9 cells is computed by the 
formula 


s (Z f column)(x f row) 
s grand total 
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Computation of expected frequencies ( fe): 


(30)(80) (30)(80) 


BOO 7 (12) RM -o GOD. (qa) 
:30)(80; 140)(80 
E sns es !- (6) A Ni. 
30)(40 140)(40; 
Computation of the x? value: 
AN (f, 23 f.)?\ 
3a) 
(6 — 12)? | (60 — 56)? _ (14 — 12)? | 
ges iT oor 
LL 2 Jesi 2 2 
Sen gu 239 T EAE u^ xad. CIE n = 1.38 
(10 9" = 267 Qr 29 = 87 Bol = 67 


x? = 3.29 + 33 + .33 + 1.14 + 1.33 + 2.67 + .57 + .67 = 10.33 


degrees of freedom = (rows — 1)(columns — 1) 
=(3 = 16 =.1) +2)(2), = 4 
x? critical values for 4 degrees of freedom (see Appendix E). 


01 05 yx? = 10.33 
13.28 | 9.49 


The test indicates that there is a significant relationship between major 
and number of cigarettes smoked at the .05 but not at the .01 level of 
significance. If we wished to answer the question, “Is there a relationship 
between being a business major and number of cigarettes smoked?” we 
would combine the liberal arts and engineering categories and use a x? 
table with six rather than nine cells. 


Number of Cigarettes Smoked per Day 

MAJOR NONE 1-15 MORE THAN 15 TOTAL 
Business 6(12) 6(56) 14(12) 80 
Nonbusiness —24(18)  80(84)  16(18) 120 


Total 30 140 30 200 
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(30)(80) _ (140)(80) — (30)(80)_ 
Km ui EL 


goni zd RED = (84) cogeo) = (18) 
(Q4 TO aoo 0 MP iD oL MR 


* 18 84 18 


X? = 3.00 + 29 + 33 + 200 + .19 + 22 = 6.03 at2dr.0! | 921 
.05 | 5.99 


The null hypothesis may be rejected at the .05 but not at the .01 level 
of significance. 

In a2 x 2 table (4 cells) with 1 degree of freedom, there is a simple 
formula that eliminates the need to calculate the theoretical frequencies 
for each cell. z 


ai NIIAD — BC]? 
X = 4 * BUC + DA + CNB + Dj 


Terms in a 2 x 2 table 
A B 
C D 


Let us use an example employing this formula. A random sample of 
auto drivers revealed the relationship between experiences of those who 
had taken a course in driver education and those who had not. 


REPORTED NO 
ACCIDENT ACCIDENT TOTAL 


Had driver's education 44A 10B 54 
No driver's education 81C 35D 116 
Total 


This isa 2 x 2 table with one degree of freedom. 


[44 x 35) — (10 x 81)|]2 
(54 + 10)(81 + 35)(44 + 81)(10 + 35) 


(11540 — &1oJr? 
(64)(116)(125)(45) 


. -170(730)? _ 90,593,000 _ "E 
- 41,760,000 ^ 41,760,000 ^ ^ 


x? = 170 
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The x? value does not equal or exceed) the critical x? value (3.84) 
necessary to reject the null hypothesis at the .05 level of significance. There 
seems to be no significant relationship between completing the course in 
driver education and the number of individuals who had recorded auto 
accidents. 


Yate's correction for continuity. In computing a chi square value for a 
2 x 2 table with one degree of freedom, the formula is modified when 
any cell has a frequency count of fewer than 10. This formula differs from 
the previous formula. 


2 
[wo - Bd] - d 
X - (X Be + DIA? OB * D) 


Example: A pharmaceutical company wished to evaluate the effective- 
ness of X—40, a recently developed headache relief pill. 

Two randomly selected and assigned samples of patients who com- 
plained of headaches were given pills. The experimental group was given 
6 X—40 pills daily and the control group was given 6 placebos (or sugar 
pills) daily, although they thought that they were receiving medication. 
After a week they repeated their experience. 


X-40 PLACEBO 


x c TOTAL 
Headaches relieved 30, 40, 70 
Headaches continued 4c 105 14 
Total 34 50 84 


A X! test using a 2 x 2 table at 1 degree of freedom was applied, 
with Yate's correction. Was the effectiveness of the X—40 medication sig- 
nificant at the .05 level? 


a 8480 x 10) - (40 x 4| — 42)? _ 84{|800 — 160| — 42] 
X = (80 + 40)(4 + 10)(30 + 4)(40 + 10) (70)(14)(84)(50) 
.. B4(98) _ 84(9604) _ 806,736 
= 1,666,000 1,666,000 1,666,000 
> _ 806,736 
Le pce, xd 
1,666,000 


The computed x? is far below the x? critical value (3.84) necessary for 
the rejection of the null hypothesis at the .05 level. The researcher con- 
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cludes that the null hypothesis is not rejected: thére is no significant re- 
lationship between the use of X—40 pills at this dosage and headache relief. 
Any apparent effectiveness was probably the result of sampling error. 


The Mann-Whitney Test 


The basic computation is U,, and in experiments using small samples, 
the significance of an observed U may be determined by the U critical 
values of the Mann-Whitney tables. 

When the size of either of the groups is more than 90, the sampling 
distribution of U rapidly approaches the normal distribution, and the null 
hypothesis may be tested with the reference to the z critical values of the 
normal probability table, 

The values of the combined samples, N, and No, are ranked from the 
lowest to the highest rank, irrespective of groups, rank 1 to the lowest score, 
rank 2 to the next lowest, and so forth. Then the ranks of each sample 
group are summed individually and represented as X R, and X Rs. 

There are two Us calculated for the formulas: 


a. U NN + M+) V. 


b. Uz = NN, + Malte +) 1) -ER, 


N, = number in one group = R, = sum of ranks in one group 
N2 = number in second group X R, = sum of ranks in Second group 


Only one U need be calculated, for the other can be easily computed 
by the formula 
U, = NN, — Uz 


It is the smaller value of t/ that is used when consulting the Mann- 
Whitney U table. 
The z value of U can be determined by the formula 
y MN 


z= = 


SSS eee 
JIN MNo)(N, + Nz + 1) 
ETS PC ae ar 


V 12 
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TABLE 9-4 Performance Scores of Students Taught by Method A or by Method B 


A RANK B RANK 
50 3 49 2 
60 8 90 36 
89 35 88 33.5 
94 38 76 21 
82 28 92 37 
75 20 81 27 
63 10 55 7 
52 5 64 11 
97 40 84 30 
95 39 51 4 
83 29 47 1 
80 25.5 70 15 
7 22 66 12 
80 25.5 69 14 
88 33.5 87 32 
78 23 74 19 
85 31 n 16 
79 24 61 9 
72 17 55 6 
68 13 73 18 
N, = 20 SR, = 469.5 N; = 20 XR, = 3505 
U, = NN, + Em HE 


1 


It does not matter which U (the larger or the smaller) is used in the 
computation of z. The sign of the z will depend on which is used, but the 


numerical value will be identical. 
For example, a teacher wishes to evaluate the effect of two methods 


of teaching reading to two groups of 20 randomly assigned students, drawn 
from the same population (see Table 9—4). 

The null hypothesis proposed is that there is no significant difference 
between the performance of the students taught by Method A and the 
students taught by Method B. 

After a period of four months' exposure to the two teaching methods, 
the scores of the students on a standardized achievement test were re- 
corded. All scores were ranked from lowest to highest and the Mann- 
Whitney test was used to test the null hypothesis at the .05 level of signif- 
icance. 

(20)(20) + SES — 469.50 
400 + 210 — 469.50 
140.50 


LU 


U, 


Il 
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SUMMARY 


sym 
= (20)(20) + E — 350.50 


= 400 + 210 — 350.50 
= 259.50 


Check: U, = N,N: — UZ 
140.50 = 400 — 259.50 
140.50 — 140.50 
y, MN. 


2 
Aa Ss PRINS ECT "ay 
NNN, + Nz + 1) 

12 


400 
janis bia: —59.5 —59.50 


pud (20)(20)(41) . V1366.67 3697 
V 12 


Z= = t.61 


Because the observed z value of — 1.61 did not equal or exceed the z 
critical value of 1.96 for a two-tailed test at the .05 level, the null hypothesis 
was not rejected. The difference was not significant, and the apparent 
superior performance of the Method A group could well have resulted 
from sampling error. 

For further information on these and other nonparametric tests, we 
recommend Hollander and Wolfe (1973) and Siegel (1956). 


acteristics of populations, and although samples from the same population 
will differ from one another, the nature of their variation is reasonably 


(parameters) with known probabilities of error. 
The pioneering contributions of Sir Ronald Fisher and Karl Pearson 


EXERCISES 
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to statistics and scientific method, and William Sealy Gosset to small-sam- 
pling theory, have made practical the analysis of many of the types of 
problems encountered in psychology and education as well as in agricultural 
and biological research, where they were first applied. 

Parametric statistical treatment of data is based upon certain assump- 
tions about the nature of distributions and the types of measures used. 
Nonparametric statistical treatment makes possible useful inferences with- 
out assumptions about the nature of data distributions. Each type makes 
a significant contribution to the analysis of data relationships. 

Statistical decisions are not made with certainty but are based upon 
probability estimates. The central limit theorem, sampling error, variance, 
the null hypothesis, levels of significance, and one-tailed and two-tailed 
tests have been explained and illustrated. Although this treatment has been 
brief and necessarily incomplete, the presentation of concepts may help 
the consumer of research to understand many simple research reports, 
Students who aspire to significant research activity, or who wish to interpret 
complex research studies with understanding, will need additional back- 
ground in statistics and experimental design. They will find it helpful to 
participate in research seminars and to acquire competence through ap- 
prenticeship with scholars who are making contributions to knowledge 
through their own research activities. t 


1. Why is it stronger logic to be able to reject a negative hypothesis than to try to 
confirm a positive one? 

2. A statistical test of significance would have no useful purpose in a purely 
descriptive study in which sampling was not involved. Do you agree? Why? 

3. When a statistical test determines that a finding is significant at the .05 level, 
it indicates that there is 700 probability that the relationship was merely the result 
of sampling error. Do you agree? Why? 

4. Any hypothesis that can be rejected at the .05 level of significance can surely 
be rejected at the .01 level. Do you agree? 

5. The t critical value necessary for the rejection of a null hypothesis (at a given 
level of significance and for a given number of degrees of freedom) is higher 
for a one-tailed test than it is for a two-tailed test. Do you agree? Why? 

6. A manufacturer guaranteed that a particular type of $teel cable had a mean 
tensile strength of 2000 pounds with à standard deviation of 200 pounds. In a 
shipment, 16 lengths of the cable were submitted to à test for breaking strength. 
The mean breaking strength was 1900 pounds. Using a one-tailed test at the 
.05 level of significance, determine whether the shipment met the manufac- 
turer's specifications. 

7. Twosamples of mathematics students took a standardized engineering aptitude 


, 
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10. 


11. 


12. 


test. Using a two-tailed test at the .05 level of significance, determine whether 
the two groups were random samples from the same population. 


GROUPA GROUP B 
N = 25 N = 30 


X = 80 88 
S= 8 = 9 


An achievement test in spelling was administered to two randomly selected 
fifth-grade groups of students from two schools. Test the null hypothesis that 
there was no significant difference in achievement between the two fifth-grade 
Populations from which the samples were selected at the .05 level of signifi- 
cance. Use the method of Separate variances. 


SCHOOL A SCHOOL B 
N = 40 N= 45 


X = 82 X = 86 
= 12.60 S = 14.15 


One group of rats was given a vitamin Supplement while the other group re- 
ceived a conventional diet. The rats were randomly assigned. Test the hy- 


X C 

N - 12 N = 16 
S=15.50g § = 12.90 
X= 140g X = 190g 


hypothesis that there was no significant difference between the mean gasoline 
mileage of the two makes of cars. 

Calculate the number of degrees of freedom when 

a. computing the statistical significance of a Coefficient of correlation. 

b. determining the Significance between two means. 

C a2 x 2 ? table computation is involved. 

d. a3 x 5 y? table computation is involved. 

In a survey to determine high school students’ preference for a soft drink, the 
results were: 


BRAND A BRAND B BRAND C 


Boys 25 30 52 
Girls 46 22 28 


13. 


14. 


15. 
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Was there any relationship between the brand preference and the gender of 
the consumers? 

A group of 50 college freshmen was randomly assigned to experimental and 
control groups to determine the effectiveness of a counseling program upon 
academic averages. Use the Mann-Whitney test to test the null hypothesis that 
there was no difference between the academic performance of the experimental 
and control groups at the .05 level of significance. 


EXPERIMENTAL CONTROL 


2.10 2.01 
3.00 2.69 
1.96 3.07 
2.04 2.14 
3.27 2.82 
3.60 2.57 
3.80 3.44 
2.75 4.00 
1.98 3.01 
2.00 2.55 
2.98 2.77 
3.10 3.09 
3.69 2.72 
2.66 3.34 
2.56 2.81 
2.50 3.05 
3.77 2.67 
2.40 1.90 
3.20 1.70 
1.71 1.57 
3.04 1.39 
2.06 2.09 
2.86 3.68 
3.02 241 
1.88 2.83 


Compute the t value of the coefficient of correlation: 


r= +.30 
N = 18 


Calculate Y given the following information: 


a= 112 
Bye 
b= 4 
b,- 3 
X, = 70 
X, = 60 
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| COMPUTER 
| DATA ANALYSIS 


The purpose of this chapter is to show how computers can be used in 
analyzing data. Computers can perform calculations in just a few seconds 
that human beings would need weeks to do by hand. Although computers, 
as we know them today, have only been in existence for approximately 40 
years, all of our daily lives are affected by them. 

The microchip has made possible small computers that are within the 
financial reach of many Americans. As the price of these small computers 
comes down and their capabilities increase, more homes and small busi- 
nesses will have computers. Three of the computer programs presented 
later in this chapter were run using a large university "main frame" com- 
puter. However, comparable programs are already available for microcom- 
puters, and we have included an example using one. 


THE COMPUTER 


| The electronic digital computer is one of the most versatile and ingenious 
developments of the technological age. It is unlikely that complex modern 
institutions of business, finance, and government would have developed 
so rapidly without the contributions of the computer. 
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To the researcher, the use of the computer to analyze complex data 
has made complicated research designs practical, Performing calculations 
almost at the speed of light, the computer has become one of the most 
useful research tools in the physical and behavioral sciences as well as in 
the humanities. 

An early predecessor of the modern computer was a mechanical de- 
vice developed by Charles Babbage, a nineteenth-century English mathe- 
matician. Late in that century Herman Hollerith, a director of the U.S. 
Census Bureau, devised a hole-punched card to aid in the more efficient 
processing of census data. The punched card was a significant development, 
for it has been a very important part of modern computer data processing. 

In the mid-1940s, an electrical impulse computer was devised with 
circuits employing thousands of vacuum tubes. These computers, which 
were very large and cumbersome, required a great deal of space. The heat 
generated by the vacuum tubes required extensive air-conditioning equip- 
ment to prevent heat damage, and the uncertain life of the vacuum tubes 
caused frequent malfunction. 

With the development of transistorized components, replacement of 
the vacuum tubes, miniaturization, increased component reliability, elim- 
ination of heat dissipation problems, and other improvements, the com- 
puter has become a much more effective device for the storage, processing, 
and retrieval of information. 

The most advanced current models have incorporated microcircuitry 


a second. 

Computer technology includes four basic functions: input, storage, 
control, and output. Input entails entering information or data into the 
computer. This is generally done through a cathode ray tube (CRT) ter- 


optical scanning readers that translate printed page information, or mag- 
netic tape. Once information is inputted, it is stored for eventual use on 
magnetic cores, tapes, or disks. Control of stored information, as well as 
new input, is achieved through programs written in one of several possible 


preprepared programs to perform a variety of siatistical procedures, so 
the researcher usually does not need to write his or her own. The output 
or retrieval process transfers the processed information or data from the 


— — 
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computer to the researcher, using one of a number of devices to com- 
municate the results. The output may be displayed on a CRT screen, printed 
on paper, or recorded on a tape or disk. 

The computer can perform many statistical calculations easily and 
quickly. Computation of means, standard deviations, correlation coeffi- 
cients, ¢ tests, analysis of variance, analysis of covariance, multiple regres- 
sion, factor analysis, and various nonparametric analyses are just a few of 
the programs and subprograms that are available at computer centers. 

It has been said that the computer makes no mistakes, but program 
writers do make mistakes, and any directions given to the computer are 
faithfully executed. The computer doesn't think; it can only execute the 
directions of a thinking person. If poor data or faulty programs are intro- 
duced into the computer, the data analysis will be meaningless. The expres- 
sion "garbage in, garbage out" describes the problem quite well. It is critical 
when using canned programs to carefully follow the appropriate program 
syntax. If a comma or slash is missing, the program may stop processing 
the data or, worse yet, process the data incorrectly. 

With the large "main frame" computers of university and large busi- 
ness computer centers, hundreds of users at different terminals can com- 
municate with the computer at a single time. Computer programs (software) 
are available at these centers for many purposes, including statistical anal- 
yses. The canned programs include the Statistical Package for the Social Sciences! 
(versions include SPSS® and SPSS-X®), Statistical Analysis System? (SAS®) 
and others. Though the actual programs, input procedures (syntax), and 
output (printouts) differ for these package programs, they are similar in 
tneir capabilities and the variety of statistical analyses that can be performed 
using them. Perhaps the most widely used are those programs published 
by SPSS. Which set of programs is used, however, depends on the user's 
needs and preference. Examples of programs from the SPSS-X and SAS 
systems—using the computer facilities of the University of Illinois at Chi- 
cago— will be presented later in this chapter. 

Microcomputers include a wide range of equipment from small, low- 
cost computers for games and other purposes to computers that cost several 
thousands of dollars and can perform a variety of functions. Depending 
on the model and storage capabilities, there are programs available, SPSS 
among them, that can calculate any of the statistical analyses presented in 
this text, and many more. An example of a program from SPSS-PC®— 
using an IBM personal computer with a hard disk— will be presented later 


in this chapter. 


1 SPSS, SPSS-X. and SPSS-PC + are trademarks of SPSS, Inc. of Chicago, Ill., for its 
proprietary computer software. 
2 SAS is the registered trademark of SAS Institute, Inc., Cary, N.C 
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DATA ORGANIZATION 


Prior to the input stage of data analysis comes the organizing of data for 
proper input into the computer system. Regardless of the type of computer 
or program to be used, if data are poorly organized the researcher will 
wT have trouble analyzing their meaning. 
The data must first be coded. Categorical data, such as a person's sex 
or occupation, need to be given a number to represent them. For instance: 


SEX OCCUPATION 
1 = Female 1 = Farmer 
2 = Male 2 = Service 

y 3= 


| Professional 


The researcher may also want to convert interval or ratio data into cate- 
gories and code them. For instance, 


IQ LEVEL INCOME 

l = 120to 139 1- 40,000 and over 
2 = 100to 119. 2 = 30,000 to 39,999 
3 = 80 to 99 3 = 20,000 to 29,999 
4 = 60 to 79 4 = below 20,000 


The next step is to assign each variable to the spaces in which it will 
always be placed. Most systems call for a maximum of 80 columns per line. 
Once the researcher knows how many spaces each variable will occupy, the 
variables can be assigned to their column numbers (from 1 to 80). If more 
than 80 spaces are needed for each subject, then two or more lines will | 
need to be assigned. The first columns will usually be the individual subject 
identity (ID) number. If less than 100 subjects are included, two Spaces | 
Starting with 01 will be needed. Sometimes a researcher may include one 
or more attributes into the ID number, thereby increasing the number of 
columns needed. For instance, in the data set used in the analysis of variánce 
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losophy major, male, and the thirteenth subject coded. When a large num- 
ber of variables are used in a study, separating the variables with spaces 
will make the data easier to comprehend and easier to use with some 
programs. In any case, the researcher needs to have a list that shows which 
variables are represented in which column numbers. » 

Figure 10—1 shows how data might look when coded on to a form. 
Note how the variables are separated from each other by a space left 
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COLUMN COLUMN 
NUMBER NUMBER 

1-4 ID Number 36-37 Independent 

6-7 Expressive Language (EL) Functioning (ABS1) 

9-10 Receptive Language (RL) 39—40 Physical Develop- ! 
12-13 Object Permanence (OP) ment (ABS2) 
15-16  Means-End (ME) 42—43 Language (ABS4) 
18-19 Vocal Imitation (VI) 45—46 Self-direction 
21-22  Gestural Imitation (GI) (ABS8) 
24—95 Causality 48—49 Mental Age (MA) 
27-28 Spatial Relations 51-53 Chronological Age 
30-31 Responsibility (ABS9) (CA) 


33-34 Socialization (ABS10) 


FIGURE 10-2 Variable list for coded data. 


between them. Figure 10-2 is the list used to determine which columns 
contained the different variables. 

Survey researchers frequently have a System for coding and recording 
their data prior to distributing the questionnaires, For example, the ques- 


Once the data are coded, they are ready to be stored in the computer. The 
researcher then must decide on the descriptive and inferential statistics 
desired and the Program(s) that he or she will use to analyze the data. The 
selection of appropriate statistics will generally depend on the design of 


example, an IBM personal computer with a hard disk was used. In each 
of these analyses, the data may be presented with the control statements 
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prise the title of this program, and DATA tells the program to read the 
data into an SAS data set created by-and for this program. The IN PUT 
statement tells the program the names of the variables and where they are. 
Because the variables in the data set are separated by spaces, we inform 
the program where they are simply by naming them in the order they 
appear in the file. When the input statement is used in this way, missing 
data must be represented by a period (.) rather than a blank Space since a 
blank space cannot Tepresent a missing data point and a space between 


Figure 10—4 shows the output produced by this program. The num- 
ber of subjects, mean, standard deviation, sum of the Scores, the lowest 


FIGURE 10-4 Sample SAS PROC CORR output, 
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and WITH statements had not been included, a correlation matrix con- 
sisting of all of the possible combinations (all 13 variables by all 13 variables) 
would have resulted. 


Example 2: Charting —SAS:CHART 


Both SPSS-X and SAS systems have very sophisticated graphing options 
including three-dimensional and, if the printer is capable, color graphics. 
The present example demonstrates the Chart procedure, a relatively simple 
one, from SAS. 

The first six lines of the control cards presented in Figure 10—5 are 
the job control language cards. These cards inform the computer of the 
computer time to be allocated and memory required, the type and place 
of printing desired, that the SAS system will be used, and the name 
(CANONI.DATA) and location (disk) of the data set to be used. The next 
two lines include the first three SAS statements and are similar to the SAS 
statements used in Example 1. The next six lines present a series of "IF 
... THEN" statements. These statements convert two of the variables to 
categories. The first four statements deal with chronological age in months 
(CA). Those children with CAs from 36 to 59 months are assigned a score 
of 1; those with CAs from 60 to 83 months are assigned a score of 2; those 
with CAs from 84 to 107 months are assigned a score of 3; and those with 
CAs above 107 are assigned a score of 4. Because the first digit of the 
three-digit ID number represents the subject's sex, those children with IDs 
over 200 are female (F) and those with IDs below 200 are male (M). The 
next line (PUT AGE SEX) creates the computer space for the variables of 
age and sex created in the previous six statements. The PROC FORMAT 
and VALUE statements assign more meaningfu! values to the variable age. 
Thus, those children assigned a 1 for CAs from 36 to 59 months are 3 to 
4 years old; those children assigned a 2 for CAs from 60 to 83 months are 
5 to 6 years old; and so on. 


FIGURE 10-5 Control cards for charting example using SAS. 
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pie or circular Braph in sections is to be created using the variable age. 
Figure 10-6 présents both of these graphs on what would be two pages 


The pie chart gives both the frequency (number of subjects) and the 
Percent of the total sample in each age range. The bar chart shows the 
number of males and females and the total number of children in each 
age group. 

Example 3: Multiple Regression—Spss.x 

SPSS, Inc. has published ten versions of SPSS for main frame computers. 
The most recent version, used in this example, is called SPSS-X. 

The current example uses data collected by the second author. The 


The first SPSS-X control card in Figure 10-7, TITLE, is used to 
name the program (SPSSX REGRESSION EXAMPLE). The DATA LIST 


are no more statements (FINISH). 
Figure 10-8 presents two pages of output produced by the above 
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FIGURE 10-6 Sample SAS PROC CHART output. 
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ü ui X REGRESSION 


FIGURE 10-7 Control cards for regression 
example using SPSS-X. 


are presented. The second page of output shows the multiple correlation 
coefficient (MULTIPLE R), its square (R SQUARE), the adjusted R square 


Yama biX, 4 b;X, ... 
Y(RL) = 1.411 + 1.287(VI) + -793(OP) + -385(ME) 
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FIGURE 10-8 Sample regression output. 


The resulting prediction of receptive language would have a standard error 
of estimate of 5.713. 


Example 4: Analysis of Variance—SPSS-PC + 


The data used in this example are identical to the data used in Chapter 9 
on the analysis of variance (ANOVA), with one important difference. In 
the current example, the ten students in each college major are evenly 
divided into males and females. Thus, instead of the simple one-way analysis 
of variance presented in Chapter 9, the current examen is a two-way 
analysis of variance. 

The SPSS-PC+ control statements are quite similar to Ens SPSS-X 
statements (see Figure 10—9). TITLE is again used to name the program, 
in this case SPSSPC ANOVA EXAMPLE. The next two lines (DATA LIST) 
inform the program that each variable is always presented in the same 
column (FIXED) and gives the name and location of each variable (i.e., 
MAJOR in column 1, SEX in column 2, and ABSTR in columns 6 and 7). 
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VALUE LABELS give names to the Categories of each variable. For in. 
stance, a subject with a 1 for major and a 1 for sex is a female mathematics 
major. While in SPSS-X the command to execute a Particular statistic (e.g., 
regression or ANOVA) would appear next, with SPSS-PC + , the data pre- 
cede this command. Thus, the next several lines consist of the BEGIN 
DATA statement, the data themselves, and the END DATA statement. 
After the data, the ANOVA command, indicating that we want to calculate 
an analysis of variance, is presented. This line also specifies abstract rea- 


and that we wish certain statistics produced (STATISTICS = 3)— the means 


Figure 10—10 shows two pages of output from the analysis of variance 
program just described. The first page lists the means (and in parentheses 
the number of subjects) for the total sample ("population" on the printout), 
each major, each sex, and the six cells of the major by sex table. The second 
page of output presents the analysis of variance table. The sources of 
variation include the total for the two main effects, each of the two main 
effects (major and Sex), the interaction effect of the two independent var- 
iables, the total of the main effects and interaction effect (EXPLAINED), 
the within-groups or error (RESIDUAL), and the total. The sum of squares, 
degrees of freedom (df), and mean Squared are presented for each of these 


FIGURE 10-9 Control cards for analysis of variance example using SPSS-PC +. 
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FIGURE 10-10 Sample analysis of variance output. 


sources of variation. F's and the significance of each F are presented for 
each of the effects. 
Of interest are the F's for the three effects: college major, sex, and 
the major by sex interaction. The F for major was found to be 15.094. 
Significance levels are carried out to three decimal places. Thus a signifi- 
cance of 0.000 is less than 0.001— less than one chance in a thousand that 
the three groups of students with different majors were Observed to differ 
because of sampling error. The F for sex was found to be 3.352. The 
significance level of 0.080 is not low enough (.05 being the highest ac- 
' ceptable error rate) for us to reject the null hypothesis for the main effect 
of sex. That is, any observed differences between females and males should 
be considered due to sampling error. Finally, the F for the interaction of 
major and sex was found to be 0.124 with a significance level of 0.884. 
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SUMMARY 


Obviously, the null hypothesis for the interaction of these variables is also 
not rejected. 


from 14.77 to 15.09. 
The purpose of this chapter has been to present the reader with an 


We suggest that students wishing to develop skills in computer data 
analysis consult.their university computer center and the suggested read- 
ings at the end of this chapter. 


Technological advances in the past 25 years have made computers an in- 
tegral part of the functioning of our. society. Computers and sophisticated 


The steps in using a computer to calculate statistical analyses are: 
(1) data organization and coding, (2) storing the.data,in the computer, 


bala) selection of appropriate descriptive and inferential statistics, (4) selection 
of appropriate programs for the desired statistics, (5) writing of control 


cards, and (6) execution of the computer program. 

This chapter has presented four examples of control cards and output 
from “canned” programs. The statistics requested in these examples are 
relatively simple: a two-way analysis of variance, a multiple regression anal- 
ysis, a descriptive statistics program, and two relatively simple graphics. 
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Regardless of which manual is used as a guide, it should be followed 
consistently in matters of form and style. The information in this chapter 
is consistent with one of the widely used style manuals, that of the American 
Psychological Association. 


FORMAT OF THE RESEARCH REPORT 


The research report, because of its relative brevity, differs somewhat 
froma thesis or dissertation. The following outline presents the sequence 
of topics covered in the typical research report prepared according to 
the American Psychological Association's (APA) Publication manual (1983): 


I. Title Page 
A. Title 
B. Author's name and affiliation 
C. Running head 
D. Acknowledgements (if any) 
II. Abstract 
III. Introduction (no heading used) 
A. Statement of the problem 
B. Background/review of literature 
C. Purpose and rationale/hypothesis 
IV. Method 
A. Subjects 
B. Apparatus or instrumentation (if necessary) 
C. Procedure 
V. Results 
A. Tables and figures (as appropriate) 
B. Statistical presentation 
VI. Discussion 
A. Support or nonsupport of hypotheses 
B. Practical and theoretical implications 
C. Conclusions 
VII. References 
VIII. Appendix (if appropriate) 


The APA style for typing a manuscript requires double spacing 
throughout the paper. Additional spaces may be used to set off certain 
elements, such as the running head on the title page, but single spacing 
should never be used. Leave margins of 1! inches at the top, bottom, right, 
and left of every page. Number all pages, except the figures, beginning 
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with the title page. The title page and the abstract are on Separate pages 
(pages 1 and 2, respectively). A new page is begun for the introduction, 
for the references, for each table and figure, and for each appendix. 
The first page of the report is the title page. This page includes the 
title, author's name, and author's affiliation near the top of the page, sep- 
arated by double spaces. Toward the bottom of the page are the running 
head and acknowledgements, separated by a double space. 
The title should be concise and should indicate clearly the purposes 
of the study. One should keep in mind its possible usefulness to the reader 
who may scan a bibliography in which-it may be listed. The title should 
not claim more for the study than it actually delivers. It should not be | 
stated so broadly that it seems to provide an answer that cannot be gen- 
eralized, either from the data gathered or from the methodology employed. | 
For example, if a simple, descriptive, self-concept study were made of a 
group of children enrolled in a particular inner-city elementary school, the 
title should not read, “The Self-Concepts of Inner-City Children." A more 
appropriate title would be “The Self-Concepts of a Group of Philadelphia 
Inner-City Children.” The first title implies broader generalization than is 
warranted by the actual study. 
The title should be typed in upper-case and lower-case letters, cen- 
tered, and, when two or more lines are needed, double spaced. The running 
head, a shortened version of the title, should be a maximum of 50 characters 
including letters, punctuation, and spaces between words. The running 
head is typed near the bottom of the page in upper-case letters. 
Acknowledgements appear as unnumbered footnotes near the bottom 
of the title page. Acknowledgements are used to indicate the basis of a 
study (e.g., doctoral dissertation), grant support, review of prior draft of 
the manuscript, and assistance in conducting the research and/or preparing 
the manuscript. They should be clearly and directly stated. Figure 11-1 
illustrates a sample title page used in submitting a manuscript that was 
subsequently published (Kahn, 1982). 
The abstract, on page 2 of the research report, describes the study 
in 100 to 150 words. Included in this summary are the problem under 
study, characteristics of the subjects, the procedures used (e.g., data-gath- 
ering techniques, intervention procedures), the findings of the study, and 
the conclusions reached by the researcher. A good abstract will increase 
the readership of the article because many persons start their reviews with 
abstracts. 


Main Body of the Report 


The main body of the research report is divided into four major sections: 
introduction, method, results, and discussion. The first of these sections, the 
introduction, begins a new page (page 3) and, because of its position, does 
not need or have a label. 
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Moral Reasoning in Irish Children and Adolescents 
as Measured by the Defining Issues Test 
James V. Kahn 


University of Illinois at Chicago 


This research was conducted while the author was a 
Senior Fulbright-Hays Scholar at University College, 
Cork, Ireland and on sabbatical leave from the 
University of Illinois at Chicago. I wish to 
acknowledge the assistance of the computer 
facilities at both universities. I also wish to 
thank the many children, teachers and administrators 
at the various schools at which the data reported 
were collected. I also wish to thank Rose Naputano 
for her secretarial assistance and Larry Nucci for 
his critical comments. 


RUNNING HEAD: IRISH MORAL REASONING 


FIGURE 11-1 Example of title page. 


332 The Research Report 


A well-written introduction has three components. The researcher 
must give a clear and definitive statement of the problem. As described in 
Chapter 2 for research proposals, the problem must indicate the need for 
the research. It is also necessary to indicate why the problem is important 
in terms of theory and/or practice. 

A review of previous literature on the topic is also an essential com- 
ponent of the introduction. The researcher must demonstrate an under- 
standing of the existing literature pertinent to his or her study. However, 
although an exhaustive review is an appropriate part of a thesis or disser- 
tation, it is not included in a research report. The author should assume 
that the reader has some knowledge of the field being investigated. Only 
research that is pertinent to the issue under investigation should be in- 
cluded. The author also needs to logically connect the previous body of 
literature with the current work. 

The final component of the introduction includes a clear rationale 


investigation of an early intervention program with children at high risk 
for mental retardation should not hypothesize that “the high-risk children 


would be, “high-risk children receiving the intervention program will have 
greater gains in IQ than will their control group peers.” 

The main body of the report continues with the method section, which 
follows the introduction. It includes two or more subsections and describes 


determine how appropriate the procedures were and how much credence 
to give the results. A well-written method section is sufficiently detailed to 
enable a reader to replicate the components of the study. The method 
section is separated from the introduction by the centered heading, “Method.” 
Generally, subsections are then labeled at the left margin and underlined. 

The method section always should include at least two subsections: 
subjects and procedures. The subsection on subjects needs to identify the 


characteristics, such as age, Sex, socioeconomic status, and race, are included 
as they relate to the study. Sufficient information must be provided to 
permit the reader to be able to replicate the sample. 

The procedures subsection describes the actual Steps carried out in 
conducting the study. This includesthe measurement devices, if no separate 
section is provided; the experimental treatments; the assignment of subjects 
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to conditions; the order of assessments, if more than one; the time period, 
if pertinent; and any design features used to control potentially confound- 
ing variables. Again, enough information must be provided to permit rep- 
lication. However, procedures that are published in detail elsewhere should 
only be summarized with the citation given for the other publication. 

Additional subsections may. be included as deemed necessary. For 
instance, if a battery of complex tests are to be used and described, a 
separate subsection on instrumentation would be appropriate. Complex 
designs also might be better described in a separate section. 

The third section of the main body is results. The results section pre- 
sents the data and the statistical analyses without discussing the implications 
of the findings. Individual scores or raw data are only presented in single- 
subject—or very small sample size—studies. All relevant findings are pre- 
sented, including those that do not support the hypothesis. Tables and 
figures are useful to supplement textual material. They should be used 
when the data cannot readily be presented in a few sentences in the text. 
Data in the text and in tables or figures should not be redundant, rather 
they should be complementary. The text should indicate what the reader 
should expect to see in the tables and figures so as to clarify their meaning. 
The level of significance for statistical analyses should be presented. 

Finally, the report's main body concludes with the discussion section. 
After presenting the results it is possible to determine the implications of 
the study, including whether the hypotheses were supported or should be 
rejected. It is appropriate to discuss both theoretical implications and prac- 
tical applications of the study. A brief discussion of limitations of the present 
investigation and proposals for future research is appropriate. New hy- 
potheses may be proposed if the data do not support the original hy- 
potheses. The researcher should also include conclusions that reflect whether 
the original problem is better understood, or even resolved, as a result of 


this study. 


References and Appendices 


The reference section of the manuscript begins a new page with the label 
"References," centered. References consist of all documents, including jour- 
nal articles, books, chapters, technical reports, computer programs, and 
unpublished works that are mentioned in the text of the manuscript. A 
reference section should not be confused. with a bibliography: a bibliog- 
j raphy contains everything that would be in the reference section plus other 
publications that arc useful but were not cited in the manuscript. Bibli- 
ographies are not generally provided for research reports; only references 
are usually included. s 
| References are arranged in alphabetical order by the last names of 
l the first-named authors. When no author is listed, the first word of the 
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title or sponsoring organization is used to begin the entry. Each reference 
Starts at the left margin of the page, with subsequent lines double spaced 
and indented. No extra spaces separate the entries. 

An appendix may be useful in providing detailed information that 
would seem inappropriate or too long for the main body of the paper. 
Each appendix begins on a new page with the label “Appendix” and its 
identifying letter, centered. Following this label is the centered title of the 
appendix and then the material. Materials that generally should be in an 
appendix include: a new computer program, unpublished tests, lengthy 
treatments that are not available elsewhere, and so on. 


THE THESIS OR DISSERTATION 


Research theses and dissertations follow the same outline as described for 
the research report. The major difference of the thesis and dissertation is 
length and comprehensiveness. Many institutions have their own style man- 
uals for these major research papers; they may require a certain order of 
topics, the designating of each major (and some minor) sections as a chapter, 
bibliographies in place of reference sections, and more complete appen- 
dices. Since a goal of the thesis or dissertation is to demonstrate the student's 
knowledge in a particular field, it is more appropriate to be complete and 


For years it was considered inappropriate for a researcher to use 
personal pronouns such as I, me, we, and so forth; people thought their 
use indicated a lack of objectivity. This changed, however, when the second 
edition of the APA's Publication manual was published in 1974. Personal 


TYPING THE 
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pronouns should be used when they are appropriate. “I believe ..." is 
preferable to “The present author believes . . . ." The writer should, how- 
ever, refrain from using plural personal pronouns (e.g., we) unless there 
are multiple authors. 

Only the last names of cited authorities are used. Titles such as pro- 
fessor, Dr., Mr., and Dean are omitted. The past tense should be used in 
describing research procedures that have been completed. 

Abbreviations may be used only after their referrent has been spelled 
out, with the abbreviation following in parentheses. There are a few ex- 
ceptions to this rule for well-known abbreviations such as IQ. 


Discussion of quantitative terms. “Few in number" and “less in quantity" 
are the preferred forms of expression. Numbers beginning a sentence 
should always be spelled out. Fractions and numbers of less than ten should 
be spelled out. Use “one-half,” but for all figures with fractions, use “ar 
or “4.5.” Percent (meaning “per hundred”) is spelled out except in tables 
and figures. Use Arabic numerals with percent (“18 percent”), unless they 
begin a sentence. Percentage means “proportion.” In numbers with more 
than three digits, commas should point off thousands or millions (1,324; 
12,304,000). 

Ordinarily, standard statistical formulas are not presented in the re- 
search report, nor are computations included. If a rather unusual formula 
is used in the analysis, it is appropriate to include it. 

Of course, the ordinary rules of correct usage should prevail. A good 
dictionary, a spelling guide, a handbook of style, and a thesaurus are helpful 
references. 

We have frequently found in our own students’ work errors of spell- 
ing, nonagreement between subject and predicate, nonparallel construc- 
tion, and inconsistent tense sequence. Students who have difficulty in writ- 
ten expression should have a competent friend or relative proofread their 
copy for correct usage before they type the final manuscript. Inability to 
write correctly is a serious limitation. Carelessness is an equally great fault. 

Writing research reports effectively is not an easy task. Good reports 
are not written hurriedly. Even skillful and experienced writers revise many 
times before they submit a manuscript for publication. 


REPORT 


Many students type their own term papers or research reports. Anyone 
with reasonable proficiency and a willingness to learn proper procedures 
can do an acceptable job. In fact, typing a report is an excellent way to 


learn proper form. [ í 
"Typographical standards for the thesis or dissertation-are more ex- 
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acting. Strikeovers, crossovers, insertions, and erasures are not permitted. 

»| Therefore, only typists with great proficiency should attempt to prepare 
thesis or dissertation copy. Although the expense of professional typing 
may seem high, the saving of time and excessive effort usually makes this 
arrangement the wise choice. 

It is the writer's responsibility to present manuscript material to the 
professional typist in proper form. Except for minor typographical matters, 
the correction of major errors is not the responsibility of the typist. After 
the material is received from the typist, the student should proofread it 
carefully before it is turned in. Of course computers and wordprocessing 
programs may negate the need for hiring a typist. 


RULES OF TYPOGRAPHY 


l. A good quality of bond paper, 83” by 11" in size and of 13 to 16 pound 
weight should be used. Only one side of the sheet is used in typewritten 
manuscript. j 

2. All margins should be 14 inches—top, bottom, left, and right. 

3. All material should be double spaced. 

4. Words should not be divided at the end of the line unless completing 
them would definitely interfere with the margin. A few spaces of 
runover is preferable. In dividing words, consult a dictionary for 
correct syllabication. 

5. Direct quotations not. over three typewritten lines in length are in- 
cluded in the text and enclosed in quotation marks. Quotations of 
more than three lines are set off from the text in a double-spaced 
paragraph and indented five spaces from the left margin without 
quotation marks. Original paragraph indentations are retained. 

6. Page numbers are given in parentheses at the end of a direct quo- 
tation. 

7.. Underlining words or letters informs the printer to set those words 
or letters in italics. For example, book titles are underlined in a typed 
manuscript and printed in italics in a journal or book. 


REFERENCE FORM 


References are cited in the text by giving the last name(s) of the author(s) 
and 


Year the reference was written. If the author's name does not appear in 
the text, the name and year appear in parentheses, separated by a comma. 
If the author's name is used in the text, the year follows the name in 
parentheses. When more than one work is cited in parentheses, the ref- 
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erences are separated by semicolons. Page numbers are only given, in 
parenthesis, for direct quotations. See examples of this style of citation 
throughout this book. 


All material referred to in the text, and only those, are listed alpha- 


betically in the reference section of the manuscript. The Publication Manual 
of the American Psychological Association (1983) has specific guidelines 
for the format of various types of work. The following illustrates the form 
that different types of references should take. 


10. 


Book: 

Vaizey, J. (1967). Education in the modern world. New York: Mc- 
Graw-Hill. 

Book with multiple authors: 

Barzun, J. & Graff, H. F. (1977). The modern researcher. New York: 
Harcourt, Brace, Jovanovich. 

Book in subsequent edition: 

Hallahan, D. P. & Kauffman, J. M. (1982). Exceptional children (2nd 
ed.). Englewood Cliffs, NJ: Prentice-Hall. 

Editor as author: 

Mitchell, J. V., Jr. (Ed.). (1985). Mental measurement yearbook (9th 
ed.). Highland Park, NJ: Gryphon Press. 

No author given: 

Prentice-Hall author's guide. (1978). Englewood Cliffs, NJ: Prentice- 
Hall. 

Corporate or association author. 

American Psychological Association. (1983). Publication manual (3rd 
ed.). Washington, DC: Author. 

Part of a series of books: 

Terman, L. M. & Oden, M. H. (1947). Genetic studies of genius series: 
Vol. 4. The gifted child grows up. Stanford, CA: Stanford Univer- 
sity Press. ‘ 

Chapter in an edited book: 

Kahn, J. V. (1984). Cognitive training and its relationship to the lan- 
guage of profoundly retarded children. In J. M. Berg (Ed.), Per- 
spectives and progress in mental retardation. Baltimore: University 
Park, 211—219. 

Journal article: 

Seltzer, M. M. (1984). Correlates of community opposition to com- 
munity residences for mentally retarded persons. American Journal 
of Mental Deficiency, 89, 1—8. 

Magazine article: 

Meer, J. (1984, August). Pet theories. Psychology Today, pp. 60—67. 
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ll. ‘Unpublished paper presented at a meeting: 

Schmidt, M., Kahn, J. V., & Nucci, L. (1984, May). Moral and social 
conventional reasoning of trainable mentally retarded adolescents. 
Paper presented at the annu meeting of the American Association 
on Mental Deficiency, Minneapolis, MN. 

12. Thesis or dissertation (unpublished): 

Best, J. W. (1948). An analysis of certain selected factors underlying 
the choice of teaching as a profession. Unpublished doctoral dis- 
sertation, University of Wisconsin, Madison. 

13. Unpublished manuscripts: 


Kahn, J. V., Jones, C., & Schmidt, M. (1984). Effect of object pref- 
erence on sign learnability by severely and rofoundly retarded 
children: A plc study. Unpublished manuscript, University of II- 
mois at Chicago. 

Kahn, J. V. (1984). Cognitive training and language learning. Man- 
uscript submitted for publication. 


14. Chapter accepted for publication: 
Kahn, J. V. (in press). Cognitive assessment with mentally retarded 
infants and preschoolers. In T. D. Wachs & R. Sheehan (Eds.), 


Assessment of developmentally delayed infants and preschoolers: 
A trans iscip inary approach. New York: Plenum. 

15. Technical report: 

Kahn, J. V. (1981). Training sensorimotor eriod and language skills 
with severely retarded children. Chicago, IL: University of Illinois 
at Chicago. (ERIC Document Reproduction Service, No. ED 204 

941). 


PAGINATION 


Page numbers are assigned to each page of the paper or report. The title 
page does not have a page number typed on it, but a number is allowed 
for it in the series. 

Page numbers are placed in the upper right-hand corner, one inch 
below the top of the page and aligned with the right margin. Pages are 
numbered consecutively from the title page, through the abstract, main 
body of the paper, and references. After the references come the footnotes 
(if any), tables, figures, and appendices (if any), the numbering of pages 
continuing in this order. 

In addition, each Page except the title page has a short title (the 
running head) typed above the page number (usually the first two or three 
words of the whole title). This is so that if the Pages are separated, they 
can be identified with the appropriate manuscript. 


TABLES 
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A table is a systematic method of presenting statistical data in vertical 
columns and horizontal rows, according to some classification of subject 
matter. Tables enable the reader to comprehend and interpret masses of 
data rapidly and to grasp significant details and relationships at a glance. 
Tables and figures should be used sparingly; too many will overwhelm the 
reader. 

Good tables are relatively simple, concentrating on a limited number 
of ideas. Including too much data in a table minimizes the value of tabular 
presentation. It is often advisable to use several tables rather than to include 
too many details in a single one. It has been said that the mark of a good 
table is its effectiveness in conveying ideas and relationships independently 
of the text of the report. 

Because each table is on a separate page following the references, the 
desired placement of the table is indicated by the following method. 

Text references should identify tables by number, rather than by such 
expressions as "the table above,” or “the following table." Tables should 
rarely be carried over to the second or third page. If the table must be 
continued, the headings should be repeated at the top of each column of 
data on each page. 

Tables should not exceed the page size of the manuscript. Large tables 
that must be folded into the copy are always cumbersome and cannot be 
easily refolded and replaced. Large tables should be reduced to manuscript 
page size by photostat or some other process of reproduction. Tables that 
are too wide for the page may be turned sideways, with the top facing the 
left margin of the manuscript. 

See Figure 11—2 for a sample of a properly presented table. The word 
table is centered between the page margins and typed in capital letters, 
followed by the table number in arabic numerals. Tables are numbered 
consecutively throughout the entire report or thesis, including those tables 
that may be placed in the appendix. The caption or title is placed one 
double space below the word table and centered. No terminal punctuation 
is used. The main title should be brief, clearly indicating the nature of the 
data presented. Occasionally a subtitle is used to supplement a briefer main 
title, denoting such additional information as sources of data and measuring 
units employed. he 

Column headings, or box heads, should be clearly labeled, describing 
the nature and units of measure of the data listed. If percentages are 
presented, the percentage symbol (%) should be placed at the top of the 
column, not with the number in the table. 

If numbers are shortened by the omission of zeros, that fact should 
be mentioned in the subtitle (“in millions of dollars”; “in thousands of 
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TABIE 2 


Occupations of Fathers of University . 
of Wisconsin Seniors Preparing to Teach 


Occupations N Non 4 N Women ge 
Li. e eee o 0 0 65 
eee im 24 2 32 29 
Skilled labor 19 18 10 9 
Farming 17 17 19 17 
Clerical-sales 16 16 18 16 
Profession 15 15 20 18 
Unskilled labor 6 6 6 5 
No data 5 5 7 6 

Total i02 ^ 100 112 100 


"Adapted from Best, J.W. (1948), An analysis of certain selected 
factors underlying the choice of teaching as a profession. 
Unpublished doctoral dissertation, University of Wisconsin, 
Madison. 


PPercentages rounded to equal 100%, 


FIGURE 11-2 A sample table. 


tons"). The "stub," or label, for the rows should be clear and concise, parallel 
in grammatical structure, and if Possible, no longer than two lines. 

Decimal points should always be carried out to the same place (e.g., 
able for a particular cell, indicate the lack by a dash, rather than a zero. 
When footnotes are needed to explain items in the table, small letter are 
used, Numerical superscripts would be confused with the data contained 
in the table. Asterisks are used to indicate probability levels and are also 
placed below the table. 


A figure is a device that presents statistical data in graphic form. The term 
figure is applied to a wide variety of graphs, charts, maps, sketches, dia- 
grams, and drawings. When skillfully used, figures present aspects of data 
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in a visualized form that may be clearly and easily understood. Figures 
should not be intended as substitutes for textual description, but included 
to emphasize certain significant relationships. 

Many of the qualities that were listed as characteristics of good tables 
are equally appropriate when applied to figures. 


l. The title should clearly describe the nature of the data presented. 
2. Figures should be simple enough to convey a clear idea and should 
be understandable without the aid of much textual description. 

3. Numerical data upon which the figure is based should be presented 
in the text or an accompanying table, if they are not included in the 
figure itself. 

4. Data should be presented carefully and accurately, so that oversim- 
plification, misrepresentation, or distortion do not result. 

5. Figures should be used sparingly. Too many figures detract from, 
rather than illuminate, the presentation. 

6. Figures follow tables in the order of items in a manuscript. The place- 
ment desired in the text is indicated in the same manner used to 
indicate the placement of tables. 

7. Figures should follow, not precede, the related textual discussion. 

8. Figures are referred to by number, never as “the*figure above" or 
“the figure below.” 

9. The title and number of the figure is placed on a separate page that 
precedes the figure in the manuscript. 


The Line Graph 


The line graph is useful in showing change in data relationships over a 
period of time. The horizontal axis usually measures the independent var- 
iable, the vertical axis the measured characteristic. Graphic arrangement 
should proceed from left to right on the horizontal axis, and from bottom 
to top on the vertical. The zero point should always be represented, and 
scale intervals should be equal. If a part of the scale is omitted, a set of 
parallel jagged lines should be used indicating that part of the scale is 
omitted (see Figure 11-3). 


" FIGURE 11-3 A line graph. 
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We have devised two figures, a line graph and a bar graph, that depict 
data relationships that were presented textually in two journal articles (see 
Figures 11—4 and 11-5). 

When several lines are drawn, they may be distinguished by using 
various types of lines—solid, dotted, or alternate dots and dashes. Black 
ink is used. 

A smoothed curve cannot be obtained by plotting any data directly. 
Only when infinite data are obtained will the lines connecting the points 
approach a curved line. The figure formed by the lines connecting the 
points is known as a frequency polygon. 


The Bar Graph or Chart 


The bar graph, which can be arranged either horizontally or vertically, 
represents data by bars of equal width, drawn to scale length. The numerical 


FIGURE 11-4 Mean verbal SAT Scores, 1972-1977, in four selected states. (Graphic representation by 
the author, adapted from Kappan interview with Ernest Sternglass, "The Nuclear Radiation/ 
SAT Score Decline Connection," Phi Delta Kappan, 61 [Nov. 1979], 184.) 
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FIGURE 11-5 Decline of mean Scholastic Aptitude Test scores from 1963 to 1978. (Throughout the 1970s, 
verbal scores declined .04 standard deviations each year and mathematical scores declined 
.025 standard deviations.) 


data may be lettered within the bar or outside it. A grid may be used to 
help quantify the graphic representation. A divided bar graph represents 
the components of a whole unit in one bar (see Figure 11—6). 

In bar graphs, the bars are usually separated by space. If the graph 
contains a large number of items, the bars may be joined to save space. 

Horizontal bar graphs are usually used to compare components at a 
particular time. Vertical bars are used when making comparisons at dif- 
ferent times. 


The Circle, Pie, or Sector Chart 


Circle, pie, or sector charts show the division of a unit into its component 
parts. They are frequently used to explain how a unit of government 
distributes its share of the tax dollar, how an individual spends his or her 
salary, or any other type of simple percentage distribution. 

The radius is drawn vertically, and components are arranged in a 
clockwise direction in descending order of magnitude. The proportion of 
data is indicated by the number of degrees in each section of the 360- 


degree circle (see Figure 11-7). 
[rossi] 
isi gue ic] Em NEN 


Horizontal Vertical Divided FIGURE 11-6 Divided bars for graphs. 
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‘FIGURE 11-7 Educational backgrounds of 129 American celebrities listed in the Current Biography 1966 
Yearbook. (Adapted from Adela Deming, "The Educational Attainments of Americans Listed 
in the Current Biography 1966 Yearbook.” ‘Unpublished report, Butler University, Indian- 
&polis, Indiana, 1967, p. 7.) 

This kind of data should be typed or printed within the segment if 
possible. If there is insufficient room for this identification, a small arrow 
should point from the identification term to the segment. 


Maps 
When geographic location or identification is important, maps may be used. 
Identification may be made by the use of dots, circles, or other symbols, 
and density or characteristics of areas can be represented by shading or 
crosshatching. A key or legend should always be supplied if shadings are 
used. 


Organization Charts 


To. show staff functions, lines of authority, or flow of work within an 
organization, an organization chart is a helpful graphic device. 


authority, supervision, or movement of materials flows from the top to the 
bottom of the chart, but variations can be indicated by the use of arrows. 


The Research Report 345 
EVALUATING A RESEARCH REPORT 


Writing a:critical analysis of a research report is a valuable experience for 

| | the student of educational research. Reports for this purpose may be taken 
from published collections and such periodicals as the Educational Re- 
searcher, the Journal of Educational Research, or one of the many other pub- 
lications that publish reports of research in education or in the closely 
related fields of psychology or sociology. Unpublished research reports 
written by previous students of educational research are another source, 
as are the theses or dissertations found in the university library. 

"Through a critical analysis, the.student may gain some insight into 
the nature of a research problem, the methods by which it may be attacked, 
the difficulties inherent in the research process, the ways in which data are 
analyzed and conclusions drawn, and the style in which the report is pre- 


sented. 
The following questions are suggested as a possible structure for the 


analysis: 


l.. The Title and Abstract 
a. Are they clear and concise? 
b. Do they promise no more than the study can provide? 

2. 'The Problem and Hypotheses (Introductory Section) 
Is the problem clearly stated? 
Is the problem properly delimited? 
Is the significance of the problem recognized? 
Are hypotheses clearly stated and testable? 
Are assumptions, limitations, and delimitations stated? 
. Are important terms defined? 
3. Review of Related Literature (Introductory Section) 

a. Is it adequately covered? 

b. Are important findings noted? 

c. Is it well organized? ' 

d. Isan effective summary provided? 

e. Is the literature cited directly relevant to the problem and hy- 
: potheses? 

4.. Method Section 

Is the research design. described in detail? 
b. Isit adequate? 
c. Are the samples described in detail? 
d 


mopage 


D 


Are relevant variables recognized? 
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SUMMARY 


€. Are appropriate controls provided to establish experimental va- 
lidity? 
f. Are data-gathering instruments appropriate? 
8. Are validity and reliability of the instruments established? 
h. Can the sample and procedure be replicated based on the infor- 
mation and references given? 
5. Results Section 
a. Isthe statistical treatment appropriate? 
b. Is appropriate use made of tables and figures? 
C. Is the analysis of data relationships logical, perceptive, and ob- 
jective? 
6. Discussion Section 
a. Is the discussion clear and concise? 
b. Isthe problem/hypothesis restated appropriately? 
c. Is the analysis objective? 
d. Are the findings and conclusions justified by the data presented 
and analyzed? 
e. Did the author(s) generalize appropriately or too much? 
7. Overall Writing of Paper 
a. Is it clear, concise, and objective? 
b. Are the parts of the paper properly related to each other? 


The research report is expected to follow the conventional pattern of style 
and form used in academic circles, Although style manuals differ in some 
of the smaller details, students are expected to be consistent in following 
the pattern of style contained in the manual required by their institution 
or in the one that they are permitted to select. 

The style of writing should be clear, concise, and completely objective. 
Of course, the highest standards of correct usage are expected, and careful 
proofreading is necessary before the final report is submitted. 

Tables and figures may help to make the meaning of the data clear. 
They should be presented in Proper mechanical form and should be care- 
fully designed to present an accurate and undistorted picture. 

The evaluation of a research project is a valuable exercise for students 
of educational research. Using analytical questions such as those suggested, 
the critiquing of another researcher's report helps students develop com- 
petency in their own research and reporting skills. 
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Appendix A 


STATISTICAL FORMULAS AND 
SYMBOLS  . 


STATISTICAL FORMULAS 


l. > is greater than 
< is less than 


GLOSSARY OF STATISTICAL 
SYMBOLS 


a > ba is greater than b 
b «abis less than a 


2. Mean X X arithmetic average 
xu =X = sum of 
"UN. X, Y scores 
N number of scores 
3. Mode M, M, mode: score that occurs most 


frequently in a distribution 


P, percentage of scores that fall 
below a given value, plus? the 
percentage of space occupied 
by that score 


4. Percentile rank p 
Eom. Qor- 50) 


R rank from the top of a distri- 
bution 
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STATISTICAL FORMULAS GLOSSARY OF STATISTICAL 
SYMBOLS 
5. Variance o? 9? population variance: mean 
Standard deviation o value of the squared devia- 
tions from the mean 
o = = x = (X =X) © population standard devia- 
(x tion: positive square root of 
the variance x — (X — M) 
fuz 2? deviation from the mean 
N 
(deviation computation) 
NZX? — (EX)? 
mU NIU 
e [NZX? — (2X)? 
ERRINA > 
(raw score computation) 
6. Variance S? S? variance of a population es- 
Standard deviation S timated from a sample 
5 $ standard deviation of popu- 
SEE E lation estimated from a sam- 
N-1 é jë 
P 
Ix 
S NEUE 
(deviation computation) 
NZX? — (2X)? 
ge = MX = EXP 
N(N — 1) 


ser NZX? — (3X)? 
x, N(N — 1) 


(raw score computation) 


Variance (52) or standard devia- Dichotomous variable 


tion (Spy) of a dichotomous vari- an outcome is either-or: plus or 
able minus, true or false, heads or 
tails 


N number of events 
P probability of an outcome 
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STATISTICAL FORMULAS GLOSSARY OF STATISTICAL 
SYMBOLS 
Sy = NP(1 — P) S8, - 
Dv = es 


(general formula) Spy = i (when P = .50) 


7. Standard error of the mean 
(Sx) 


S o 
xin" SN 


8. Standard scores ATZ z sigma score 
- T standard score 
pon X-X E Z College Board standard score 
c 
T=50+10% " x 
or T — 50 + 10z 
Ze» = 500 + 100z 
9. Coefficient of correlation (r) r Pearson product-moment 


coefficient of correlation 


r= 2(Zx)(Zy) 


N 
Ls NXXY — (=X XZY) 
OT NN 
VNZX? — ($X)? VNXY? — (ZY)? 
(tho) = 1 6xD? p(rho) Spearman difference in ranks 
MODUS N(N? — 1) coefficientof correlation 


D differencebetween each pair 
ofranks 


10. Statistical significance of r/p ^ Test of the statistical significance of a 


coefficient of correlation ' 
|TVN-2 


t 
V1-r? 
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STATISTICAL FORMULAS GLOSSARY OF STATISTICAL 
SYMBOLS 


1l. Regression line slope 


pt rise _ Zy r the slope expressed in sigma (z) 
run Zx units 
f b the slope of the line expressed in 
j ee es Y bom (2) raw scores. 
run X i Zy, 
Zx 
Ae (2) 
12. Regression equations Predicting a Y from a known X when 
f the coefficient of correlation is 
Y =a + bX known 
Y =a + b,X, + b;X, 


13. Standard error of estimate S, 
Sost = SV1 — r? 


14. Standard error of the difference between two means (independent 
variances; when variances are not equal) 


2 si, S 
Sx,-x, = N, an No 


(pooled variances; when variances are equal) 


ane S EUN | 
XX = N, + Np — 2 NN 


15. Significance of the difference between two means 
.. .. difference between means 
- standard error of the difference 
t= X, E X, 
(N, — 1)S3 + (Na — 1)S2 (t is x] (uncorrelated or unmatched 
N,+N,-2 N, No groups) 
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STATISTICAL FORMULAS GLOSSARY OF STATISTICAL 
SYMBOLS 


X -X 
R SB (Sí A) 
NON, AN) NVN; 


16. Analysis of variance 


NS 
MS, 
_ SS, 
ME dh, 
SS LONE, QNS, _ Cx? 
. n+. UON 
ss, 
MS,, iro 
2 2 
ss, = 2x3 - SUN 5 Sg = 


17. Partial correlation 


12 — ("y3)(Foa) 


Zak 
Piza ——— 13 U gg) 
"* Va ry = 35) 
18. Chi square x? 
i doc fus f, observed frequencies 


- ie f. expected frequencies 
df degreesoffreedom 


= (f rows — 1)(f columns = 1) 


NT? A B 
NÍ jan- Bc) - 5 A 


INI" NNNE 
X = TA * BC+ DA + OJ(B + D) Computation for a 9 x 9 table 


19. Mann-Whitney test (N > 20) 


N,(N, + 1) N, numberinone group 
= ;— LS 
Us = (Ns)(N2) 2 A Nə numberinsecond group 


Statistical Formulas and Symbols 353 


STATISTICAL FORMULAS GLOSSARY OF STATISTICAL 
SYMBOLS 
N2(N2 + 1 
Us = (N.N) + (Na + 1) _ XR, ZR, sumofranksofonegroup 
2 ZR, sumofranksofsecondgroup 


U- (N;)(N2) The significance of U is read from 
2 the U critical table. When N > 20, 


Z = | 
/ the z computation may be used with 
ANON t Ne = 1) the normal probability table values. 
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Appendix B 


PERCENTAGE OF AREA LYING 
BETWEEN THE MEAN AND 
SUCCESSIVE STANDARD 
DEVIATION UNITS UNDER THE 
NORMAL CURVE 


neecue6RoiLeooiuaubhui ro 
[^9 
à 
uw 
p] 
oo 
2 
È 
E 
b 
a 
p 
-- 
w 
$ 
b 
a 
M 
E 
o 
$ 
$ 
ÑN 
= 


NON ON met eee eee mLE 


Mean and Successive Standard Deviation Units : 855 


Example: Between the mean and -- 1.00z is 34.13 % of the area. 
- Between the mean and. —.50z is 19.15% of the area. 


Appendix Ca 


CRITICAL VALUES FOR PEARSON'S 
PRODUCT-MOMENT 
CORRELATION (r) 


Wo @ = d0 e —.05 vw. 502 Sa 01 df 
3 .988 997 .9995 9999 1 

4 .900 .950 .980 990 2 

5 .805 .878 .934 959 3 

6 .729 811 882 917 4 

7 .669 754 .833 874 5 

8 .622 707 .789 834 6 

9 .582 666 -750 798 7 
10 .549 632 716 765 8 
14015521 602 .685 735 9 
12 .497 576 .658 708 10 
13 .476 553 634 684 11 
14  .458 .532 .612 .661 12 
15 .441 514 592 641 13 
16 .426 497 574 623 14 
17 412 482 .558 .606 15 
18 .400 468 542 590 16 
19 .389 456 .528 075 17 
20 .378 444 516 561 18 
21 .369 .433 .503 549 19 
22 .360 423 492 537 20 
23 .352 413 482 .526 21 
24 344 404 472 515 22 
25 .337 396 462 505 23 
26 .330 .388 453 496 24 
27. "328 .381 445 487 25 
20. 917 .374 437 479 26 
29.1.2311 .367 430 A71 27 
30 .306 .361 423 463 28 
35 .282 .333 .391 428 33 
40 .264 312 .366 402 38 
50 .235 276 328 -361 48 
60 .214 254 .300 330 58 
70 .198 .235 277 305 68 
80 .185 .220 260 .286 78 
90 .174 .208 .245 270 88 
j 100 .165 196 .232 .256 98 
200 117 139 164 .182 198 
500 .074 .088 .104 115 498 
1,000 .052 .062 998 


074 .081 
10,000 .0164 0196 -0233 .0258 9,998 
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CRITICAL VALUES OF STUDENT'S 


DISTRIBUTION (t) 


& 


Soo-20 waun 


- 


12 


—— 


Two-tailed test 
level of significance 
05 Ol 
12.706 63.557 
4.303 9.925 
3.182 5.841 
2.716 4.604 
2.571 4.032 
2.447 3.707 
2.365 3,499 
2.306 3.355 
2.262 3.250 
2.228 3.169 
2.201 3.106 
2.179 3,055 
2.160 3.012 
2.145 2.977 
2.131 2.947 
2.120 2.921 
2.110 2,898 
2.101 2.878 
2.093 2.861 
2.086 2.845 
2,080 2.831 
2.074 2.819 
2.069 2.807 
2.064 2.797 
2,060 2.787 
2.056 2.779 
2.052 2.771 
2.048 2.763 
2.045 2.756 
2.042 2.750 
2,021 2.704 
2,000 2.660 
1,980 2.617 
1.960 2.576 


One-tailed test 


level of significance 
0S 01 
6.314 31.821 
2.920 6.965 
2.353 4.541 
2.132 3.747 
2.015 3.365 
1.943 3.143 
1.895 2.998 
11.860 2.896 
1.833 2.821 
1,812 2.764 
1.796 2.718 
1.782 2.681 
T3271 2.650 
1.761 2.624 
1.753 2.602 
1.746 2,583 
1.740 2.567 
1,734 2.552 
1.729 2,539 
1.725 2.528 
1.721 2.518 
1,717 2,508 
1.714 2,500 
171 2.492 
1,708 2.485 
1,706 2.479 
1,703 2.473 
1.701 2.467 
1.699 2.462 
1.697 2457 
1.684 2.423 
1.671 2,390 
1.658 2.358 
1.645 2.326 


a 
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Appendix E 


ABRIDGED TABLE OF CRITICAL 
VALUES FOR CHI SQUARE 


E nc ee 


Level of significance 
df 05 01 
1 3.84 6.64 
2 5.99 9.21 
3 7.82 11.34 
4 9.49 13.28 
5 11.07 15.09 
6 12.59 16.81 
7 14.07 18.48 
8 15.51 20.09 
9 16.92 21.67 
10 18.31 23.21 
11 19.68 24.72 
12 21.03 26.22 
13 22.36 27.69 
14 23.68 29.14 
15 25.00 30.58 
16 26.30 32.00 
17 27.59 33.41 
18 28.87 34.80 
19 30.14 36.19 
20 31.41 37.57 
21 32.67 38.93 
22 33.92 40.29 
23 35.17 41.64 
24 36.42 42.98 
25 37.65 44.31 
26 38.88 45.64 
27 40.11 46.96 
28 41.34 48.28 
29 42.56 49.59 
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Appendix F 


CRITICAL VALUES OF THE F 
DISTRIBUTION 


DF FOR 
DE- DF FOR NUMERATOR 
NOMI- 
NATOR 5 6 7 8 9 10 861 12 
1 572 582 589 594 599 602 605 607 
230 234 237 239 241 242 243 24 
2 929 933 935 937 938 939 940 94 
19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 
993 993 994 994 994 994 994 99.4 
3 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.22 
9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74 
282 279 277 275 273 272 WA 271 
4 405 401 398 395 394 392 391 39 
626 616 609 604 600 596 594 5.91 
155 152 150 148 147 145 144 144 
5 345 340 337 334 392 330 328 327 
505 495 488 482 47 4.74 4.71 468 
110 107 105 103 102 101 996 989 
6 30 | 378 346 329 318 311 305 301 298 296 294 292 290 
05 | 599 514 476 453 439 428 4.21 415 410 406 403 400 
o | 127 109 978 915 875 847 826 810 798 787 779 772 
7 10 | 359 3.26 307 296 288 283 278 275 272 270 268 267 
05 | ssa 474 435 412 397 387 379 373 368 364 360 357 
o | 1227 955 845 785 746 7.19 699 684 672 662 654 647 
8 30 | 346 311 292 281 273 267 262 259 256 254 252 2.50 
05 | 532 446 407 384 369 358 350 344 339 335 331 328 
o | 113 865 759 7.01 663 637 618 603 591 561 573 567 
9 10 | 336 301 281 269 261 255 251 247 244 242 240 238 
os | 512 426 386 363 348 337 329 323 318 314 310 307 
œ | 106 802 699 642 606 580 561 547 535 526 518 5.11 
10 30 | 329 292 273 261 252 246 241 238 235 232 230 228 
5 | 496 410 371 348 333 322 314 307 302 298 294 29! 
01 10.0 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.77 47 
7" 10 | 323 286 266 254 245 239 234 230 227 225 223 224 
os | 484 398 359 336 320 309 301 295 290 285 282 279 
oi! | 965 721 622 567 532 507 489 474 463 454 448 44 
12 40 | 318 281 261 248 239 233 228 224 221 219 217 215 
vs | 475 as9 349 326 311 300 29! 285 280 275 272 269 
‘on | 933 693 595 541 506 482 464 450 439 430 422 416 
13 130 | 314 278 256 243 235 228 223 220 216 214 212 210 
‘os | 467 381 34! 318 303 292 283 277 271 267 263 200 
‘on | 907 670 574 521 486 462 444 430 419 410 402 396 
14 10 | 310 273 252 239 23 224 219 215 212 210 208 205 
os | 460 374 334 311 296 285 276 270 265 260 257 253 
(3| 886 651 556 504 469 446 4.28 414 403 394 386 380 
15 10 | 307 270 249 236 227 221 216 212 209 206 204 202 
5 | 454 368 329 306 290 279 271 264 259 254 251 248 
‘or | &e8 636 542 489 466 432 414 400 389 380 373 367 
16 30 | 305 267 246 233 224 218 213 209 206 203 201 199 
os | 449 363 324 301 285 274 266 259 254 249 246 242 
i! | &s3 623 529 477 444 420 403 389 378 369 362 355 


360 Critical Values of the F Distribution 


DF FOR| 
DE- DF FOR NUMERATOR 
NOMI- 
NATOR | x 15 20 24 30 40 50 60 100 120 200 500 © 
1 40 | 61.2 61.7 620 623 625 627 628 63.0 63.1 63.2 63.3 63.3 
05 246 248 249 250 251 252 252 253 253 254 254 254 
2 40 942 9.44 945 946 947 947 947 948 948 9.49 949 949 
405 | 19.4 194 19.5 195 19.5 19.5 19.5 19.5 19.5 19.5 19.6 19.5 
0t 994 99.4 995 995 995 99.5 995 995 995 99.5 995 99.5 
3 10 520 5.18 5.18 5.17 5.16 515 515 514 5.14 5.14 514 6.13 
05 870 866 864 862 859 858 857 855 855 854 8.53 8.53 
ot 269 267 266 265 264 264 26.3 262 26.2 262 26.1 26.1 
4 410 3.87 384 383 382 380 350 379 378 3.78 3.77 3.76 3.76 
05 586 580 577 5.75 572 5.70 5.69 5.66 5.66 5.65 5.64 5.63 
01 142 140 139 13.8 13.7 137 137 136 13.6 13.5 13.5 13.5 
5 10 3.24 3.21 319 3.17 $16 315 $14 313 3.12 3.12 3.11 3.10 
05 462 456 4.53 4.50 446 4.44 443. 441 440 4.39 437 436 
01 972 9.55 9.47 9.38 929 924 920 9,13 9.11 9.08 8.04 9.02 
6 10 2.87 2.84 2.82 2.80 278 277 2276 275 274 273 273 272 
05 3.94 3.87 384 3.81 3.77 3.75 374 3$7 $70 369 3.68 3.67 
| 0t 7.56 740 731 7.28 744 7.09 , 706 699 6.97 6.93 690 688 
7 10 2.63 2.59 2.58 256 254 252 251 250 249 248 248 247 
5 05 3.51 344 341 3.38 3934 332 $30 327 327 3.25 324 323 
10 631 616 607 599 5.91 5.86 582 575 5.74 5.70 5.67 5.65 
8 10 2.46 242 240 2.38 2.36 235 234 232 232 231 2.30 2.29 
05 322 315 3.12 3.08 3.04 3.02 3.01 2.97 2.97 2.95 294 2.93 
-01 5.52 536 528 5.20 512 5.07 503 496 495 4.91 4.88 4.86 
9 440 2.34 2.30 228 225 223 222 221 219 218 217 217 2.16 
.05 3.01 2.94 2.90 2.86 283 280 279 2:76 2.75 273 272 2.71 
01 496 481 473 465 457 452 448 442 440 436 433 431 
10 10 2.24 220 218 216 213 2.12 211 2.09 208 2.07 206 2.06 
05 285 277 274 2.70 266 2.64 2.62 2.59 258 256 2.55 2.54 
ot 456 441 4.33 4.25 447 412  À 408 401 400 396 393 391 
" 10 247 212 210 2.08 2.05 2.04 203 2.00 2.00 1.99 1.96 1.97 
.05 272 2.65 2.61 2.57 253 251 249 246 245 243 242 240 
E 4.25 410 402 3.94 $86 381 3.78 3.71 3.69 3.66 3.62 3.60 
12 10 2.10 2.06 2.04 2.01 1.99 1.97 1.96 1.94 1.93 1.92 1.91 1.90 
05 2.62 254 2.51 2.47 243 2.40 2.38 235 234 232 2.31 2.30 
01 401 $86 378 370 362 3.57 354 3.47 3.45 3.41 3.38 3.36 
13 10 2.05 2.01 1.98 1.96 1.93 1.92 1.90 1.88 1.88 1.86 1.85 1.85 
05 2.53 246 242 2.38 2.34 231 230 226 2.25 2.23 2.22 221 
-01 3.82 3.66 $59 351 3.43 3.38 3.34 3.27 325 3.22 3.19 3.17 
14 10 2.01 1.96 1,94 1.91 1.89 1.87 1.86 1.83 183 1.83 1.80 1.80 
05 2.46 2.39 2.35 2.31 2.27 2.24 222 219 2.18 2.18 244 2.13 
01 3.66 3.51 3.43 3.35 327 322 3.18 3.11 3.09 $09 303 3.00 
15 40 197 1.92 1.90 1.87 1.85 1.83 1.82 1.79 1.79 177 176 1,76 
-05 240 233 229 2.25 2.20 2.18 216 212 2:11 210 208 2.07 
t 3.52 337 329 3.21 313 3.08 3.05 2.98 2.96 2.92 289 2.87 
16 0 [194 1.89 1.87 1.84 1.81 179 178 176 1,75 1.74 1.73 1.72 
05 235 228 2.24 2.19 2.15 2.12 211 2.07 2.06 2.04 2.02 201 
01 341 3.26 3.18 3.10 3.02 297 293 286 2.84 2.81 2.78 2.75 
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DF FOR NUMERATOR A e 
x NOMI- 
1 2 3 4 5 6 7 8 9 10 " 12 NATOR 


3.03 2.64 2.44 2.31 222 215 210 206 203 200 1.98 1.96 | 10 7 
3.01 262 242 229 220 213 208 204 200 1.98 196 1.93 | 10 18 


299 261 240 227 218 211 206 202 1.98 1.96 194. 1.91 10 19 


2.93 254 233 219 210 204 198 194 1.91 1.88 185 183 | .10 24 


273 233 21! 1.97 1.88 180 175 170^ 166 ^163 «160. 157 10 200 
389 304 265 242. 226 214 206 198 193 4188. 184 1.80. | .05 
676 471 3.88 3.41 311 289 273 260 250 241 234 | 227 01 

230 208 194 185 177 172 167 163 160 157 -1.55 


This table is abridged from Table 18 in Biometrika Tables for Statisticians, vol. 1, 2nd 
ed. New York: Cambridge, 1958. Edited by E. S. Pearson and H. O. Hartley. Reproduced with 
the kind permission of the editors and the trustees of Biometrika. 
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DE- DF FOR NUMERATOR 


18 10 189 184 181 178 175 174 172 170 169 168 167 166 


19 10 1.86 181 179 176 173 uA 170 167 1.67 1.65 1.64 1.63 


24 10 1.78 173 170 1.67 1.64 1.62 1.61 1.58 1.57 1.56 1.54 1.53 


120 -10 1.55 1.48 145 141 1.37 134 1.32 1.27 1.26 1.24 1.21 1.19 


200 10 1,52 1.46 142 1.38 134 131 128 1.24 122 1.20 117 144 
05 172 1.62 157 1.52 1.46 141 1.39 1.32 1.29 1.26 1.22 1.19 j 
01 2.13 1.97 1.89 1.79 1.69 1.63 1.58 1.48 144 1.39 1.33 1.28 f 


y 


Appendix G 


RESEARCH COURSE REPORT 


EVALUATION 


RESEARCH REPORT EVALUATION FORM 


Name. 1 "uU X i3 x pate 


+ adequate 
TITLE 
clear and concise 
PROBLEM AND HYPOTHESES 


clearly stated 

specific questions raised 
clear statement of hypothesis 
testable hypothesis 
significance recognized 
properly delimited 
assumptions stated 
important terms defined 


DATA ANALYSIS 


perceptive recognition of data 
relationships 

effective use of tables 

effective use of figures 

concise report of findings 

appropriate statistical treatment 

logical analysis 


SUMMARY 


problem restated 
questions/hypothesis restated 
procedures described 
concisely reported 
supporting data included 

conclusions based on data analysis — 


Grade 
— inadequate 


REVIEW OF RELATED 
LITERATURE 


adequately covered 
well-organized 

important findings noted 
studies critically examined 
effectively summarized 


PROCEDURES 


described in detail 

adequate sample 

appropriate design 

variables controlled 

effective data-gathering instruments or 
procedures 


FORM AND STYLE 


typing 
spacing 
margins 
balance 
table of contents 
list of tables 
list of figures 
headings 
pagination 
citations/quotations 
footnotes 
tables 
figures 
bibliography ————________—— 
appendix 
spelling 
punctuation —_______— 
sentence structure ————____—. 
proofreading ——————_ 
clear and concise style ——____——. 
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ANSWERS TO STATISTICS 


CHAPTER 8 


364 


EXERCISES 


Agree. The median could be lower than the mean if a large proportion of 
the families had low incomes. 

Disagree. The median is that point in a distribution above and below which 
half of the scores fall. It may not be the midpoint between the highest and 
the lowest scores. 


M = 55.33 

Md = 58.50 

M = 75 Range = 31 
Md = 77 


Variance = 41.33 

Standard deviation = 6.43 

Disagree. The range does not determine the magnitude of the variance or 
the standard deviation. These values indicate how all of the scores, not the 
most extreme, are clustered about the mean. 


a. no change d. +5 
b. +5 e. no change 
eut f. no change 


Percentile rank = 93. 


CHAPTER 9 


10. 


1l. 


12. 


13. 


14. 


15. 
16. 
17. 
18. 


19; 


20. 
21. 
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M = 72 standard deviation = 6 


z E VAES RES | 0 tl, +2 +3 
heads 54 60 66 72 78 84 90 
Meo X z Z 
66 +5 +1.00 60 
58 =3  —.60 44 


70 +9 41.80 68 
61 0 0 50 
52191529: 51:80:89 


a 19 E 18) 


b. 8976 — g. +2.33; 
c..8796 h. —.67,to + .67, 
d. 6% i. 32 
ENESE Pa | Hag 
a. 119 d. 107 
b. 96 e. 85 to 115 
c. 75% 
TOM DONNA HARRY 
algebraz -1.00 +.33 sre 
historyz +1.25 +.50 T2 


a. Tom d. Tom 

b. Tom e. Donna 

c. Harry f. Harry 

Disagree. The coefficient of correlation is an indication of the magnitude of 
the relationship, but does not necessarily indicate a cause-and-effect rela- 


tionship. 

rho = +.61 

r= +65 

r= +.53 

r= —1.00 most correct most incorrect 
least correct, least incorrect 

Agree r= T expressed in sigma units 


rise 7 
= — expressed in raw scores 
run 


The value of r cannot exceed +1.00 
The value of b can exceed * 1.00 


Sia = 4.96 
a. Y= 44 
b. Y = 36 


Confirming a positive hypothesis provides a weak argument, for the conclu- 
sion may be true for other reasons. It does not preclude the validity of 


Answers to Statistics Exercises 


14. 


15. 


alternative or rival hypotheses. Rejecting a negative hypothesis employs stronger 
logic. 

Agree. A test of statistical significance provides a basis for accepting or re- 
jecting a sampling error explanation on a probability basis. Only when a 
sampling process is involved is a test of significance appropriate. 

Agree. The level of significance determines the probability of a sampling 
error, rather than a treatment variable explanation. When a researcher finds 
an observation significant at the .05 level, he or she is admitting that there 
is a 5/100 chance of a sampling error explanation. 

Disagree. The .01 alpha level is a much more rigorous criterion than the .05 
level. However, any hypothesis that can be rejected at the .01 level can surely 
be rejected at the .05 level of significance. 

Disagree. The t critical value for a one-tailed test is lower, The area of rejection 
is one side of the normal curve and it is not necessary to go out as far to 
reach it. 


t CRITICAL 
VALUES FOR 
REJECTION 
2t It 
-05 level 1.96 1.64 
.O1 level 2.58 2.33 
t= —2.00 Reject the null hypothesis. The cable did not meet the man- 
ufacturer's specifications. 
t = 3.49 Reject the null hypothesis, The means do not behave as 
sample means from the same population. 
t= 1.38 Do not reject the null hypothesis. There was no significant 
difference between the achievement of the two groups. 
t= 3.77 Reject the null hypothesis. The weight gain for the exper- 
imental group was significant. 
t= 3.13 Reject the null hypothesis. The difference in gasoline mile- 
age was significant. 
a NMED 
b. MESI 
Cred) Nam <2 
d. 1 
e. 8 
x? = 14.06 Reject the null hypothesis. There seems to be a significant 
relationship between gender and brand preference. 
z= .28 Do not reject the null hypothesis, The effect of the coun- 
seling program did not seem to be Statistically significant. 
t= 1.26 Do not reject the null hypothesis. The coefficient of cor- 
relation was not statistically significant. 
Y = 946 
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SELECTED INDEXES, ABSTRACTS, 
AND REFERENCE MATERIALS 


References about References 


There are a number of publications that identify specific references that 
cover particular areas of knowledge. 


American Reference Books Annual. Bodhan S. Wynar, ed. Littleton, 
CO.: Libraries Unlimited, 1970— date. 


Most reference books published or distributed in the United States 
are reviewed. Reviews, written by more than 200 library specialists, vary 
in length from 75 to 300 words, and are not cumulated from year to year. 
This is probably the most complete and up-to-date reference on references 
available. 


A Guide to Reference Books (10th ed.). Eugene P. Sheehy, compiler. 
Chicago: American Library Association, 1986. 


This comprehensive work lists, without evaluation, by subject area, 
by type, and by author or editor, the most important reference books 
printed in a number of languages. A section is devoted to education. Sup- 
plements appear every two or three years. 
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Wynar, Christine L. Guide to Reference Books for School Media Cen- 
ters. Littleton, CO.: Libraries Unlimited, 1976. 475 pp. 


This guide includes 2575 entries with evaluative comments on ref- 
erence books and selection tools for use in elementary schools, junior and 
senior high schools, and community and junior colleges. It is indexed by 
author, subject, and title. 


Reference Books Review Index. Ann Arbor, MI.: Pierian Press, 1978. 
This annotated listing of references issues supplements quarterly. 
Booklist. Chicago: American Library Association, 1905—date. 


Published biweekly and cumulated every two years, this reference 
presents an unbiased critical analysis by expert librarians of atlases, ency- 
clopedias, biographical works, dictionaries, and other reference materials 
in terms of their usefulness and reliability for libraries or homes. 


Cumulative Book Index. New York: H. W. Wilson Co., 1898— date. 


This monthly publication, cumulated semiannually and in one- and 
two-year cumulations, indexes all books published in the English language 
by author, title, and subject. It is helpful in assuring the student that all 
pertinent books have been covered in his or her searches. , 


Books in Print 

Subject Guide to Books in Print. R. R. Bowker Co., 1948— date. 6 vols. 
Vols. 1—3 Authors 

Vols. 4—6 Titles and Publishers 


These multivolume comprehensive listings of in-print titles list names 
of publishers and other publication information. 


& 


The Standard Periodicals Directory. New York: Oxbridge Publishing 
Co., 1964— date. 


, Published every other year, this directory of oyer 30,000 entries covers 


‘every type of periodical, with the exception,of local newspapers. Periodicals 


are defined as publications appearing at least once every two years. Two 
hundred classifications are arranged by subject. An alphabetical index is 
provided. i 
Ulrich’s International Periodicals Directory. New York: R. R. Bowker 
Co., 1966—date. 2 vols. 
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l This classified list of more than 57,000 foreign and domestic peri- 
odicals is arranged by subject and title. Publication information is provided. 


Irregular Serials and Annuals. An International Directory: Excepting 
Periodicals Issued More Frequently than Once a Year. R. R. Bowker 
Co., 1972- date. 


Published biennially, this directory includes more than 20,000 pub- 
lications. 


Sources of Information in the Social Sciences (3rd ed.). Chicago: Amer- 
ican Library Association, 1986. 


Organized by subject area and indexed by author and title, this work 
contains a comprehensive listing and brief description of reference books, 
monographs, and scholarly journals. 


Schorr, Alan E. Government Reference Books: A Biennial Guide to 
United States Government Publications. Littleton, CO.: Libraries Un- 
limited, 1968/69— date. 


This guide describes more than 1300 publications. 


Indexes 


A periodical index serves much the same purpose as the index of a book 
or the card file of a library. Usually listing articles alphabetically under 
subject, title, and author headings, the sources of periodical articles are 
indicated. Readers should read the directions for the use of an index before 
trying to locate references. Most indexes provide complete directions, as 
well as a list of the periodicals covered, the issue dates included, and a key 
to all abbreviations used. 


Education Index. New York: H. W. Wilson Co., 1929—date. Published 
monthly (September through June), and cumulated annually. 


Canadian Education Index. Ottawa, Ontario: Canadian Council for 
Educational Research, 1965-— date. 


Issued quarterly, this publication indexes periodicals, books, pam- 
phlets, and reports published in Canada. 


Current Contents: Education. Philadelphia: Institute for Scientific 
Information and Encyclopedia Britannica Educational Corporation, 
1969 — date. 
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Issued weekly, this publication reproduces the table of contents of 
more than 500 foreign and domestic educational periodicals. It contains 
an author index and address directory to facilitate writing for reprints of 
the articles and to identify the author's organization. Reprints are available 
directly from the Institute for Scientific Information. 


Current Index to Journals in Education. Phoenix, AZ.: Oryx Press, 
1969— date. 


This index is issued monthly and cumulated semiannually and an- 
ually, and indexes approximately 20,000 articles each year from more than 
700 education and education-related journals, a joint venture with the 
National Institute of Education. 


Index of Doctoral Dissertations International. Ann Arbor, MI.: Xerox 
University Microfilms, 1956—date. 


Published as the issue 13 of Dissertation Abstracts International each year, 
this work consolidates into one list all dissertations accepted by American, 
Canadian, and some European universities during the academic year, as 
well as those available in microfilm. It indexes by author and key words 
selected from dissertation titles. 


Readers’ Guide to Periodic Literature. New York: H. W. Wilson Co., 
1900— date. 


Issued twice each month, Readers! Guide indexes by subject and author 
articles of a popular and general nature. Prior to 1929, Readers’ Guide 
covered many of the educational periodicals. By 1929, the number of ed- 
ucational periodicals had become so great that the Education Index was 
established as a more specialized guide. Readers’ Guide may be helpful to 
students in education for finding references to articles in areas outside the 
field of professional education. 

Abridged Readers’ Guide to Periodic Literature, New York: H. W. 

Wilson Co., 1935— date. 


Fifty:six selected periodicals most likely to be found in smaller libraries 
are indexed here. 


New York Times Index. New York, 1913— date. 


This index is published biweekly with annual cumulation, and it clas- 
sifies material in the New-York. Times alphabetically and. chronologically 
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under subject, title, person, and organization name. It is also useful in 
locating materials in other newspapers because it gives a clue to the date 
of events. Complete issues of the New York Times are available in microfilm 
form in many libraries. 


Subject Index to the Christian Science Monitor. Boston: Christian Sci- 
ence Monitor, 1960— date. 


This publication is issued monthly with annual cumulations. 

Social Sciences Index. New York: H. W. Wilson Co., 1974— date. 

This guide indexes 263 periodicals. 

Humanities Index. New York: H. W. Wilson Co., 1974—date. 

Formerly published as Social Sciences and Humanities Index (1965— 1973), 
the Humanities Index lists 260 periodicals. These two indexes, each issued 
quarterly and cumulated annually, index alphabetically by subject and title 
articles from more than 260 periodicals, including many published outside 


the United States. 


Physical Education/Sports Index. Albany, NY: Marathon Press, 1978— 
date. 


This quarterly covers more than 100 journals. Since Education Index 
and Current Index to Journals in Education cover fewer than 10 physical 
education journals, these indexes provide an important additional source. 


Rehabilitation Literature. Chicago: The National Society for Crip- 
pled Children and Adults, 1940— date. 


Published monthly, this index lists material on the physically handi- 
capped. 


Abstracts 


Another type of reference guide is the abstract, review, or digest. In ad- 
dition to providing a systemized list of reference sources, it includes a 
summary of the contents. Usually the summaries are brief, but in some 
publications they are presented in greater detail. 


Dissertation Abstracts International. Ann Arbor, MI.: Xerox Univer- 
sity Microfilms, 1955— date. 
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Dissertations accepted by most universities in the United States and 
Canada and some in foreign countries are indexed by author and key word. 
Libraries or individuals may purchase complete xerographic or microfiche 
copies of any dissertation. 


Master's Abstracts International. Ann Arbor, MI.: Xerox University 
Microfilms, 1962—date. 


Issued semiannually, this guide abstracts those master's degree theses 
that are available on microfilm. 


Resources in Education. Washington, D.C.: Superintendent of Doc- 
uments, Government Printing Office, 1966—date. 


This monthly abstract journal prepared by the National Institute of 
Education reports new and completed research projects gathered by the 
16 Educational Research Information Centers (ERIC). 


Completed Research in Health, Physical Education and Recreation In- 
cluding International Sources. Washington, D.C.: American Alliance 
for Health, Physical Education and Recreation, 1958— date. 


Issued annually, this work indexes by subject and title abstracts of 
studies conducted throughout the world. 


Child Development Abstracts and Bibliography. Chicago: University 
of Chicago Press, 1927— date. 


Issued every four months and cumulated every three years, this pub- 
lication abstracts more than 20 journals. 


Exceptional Child Education Resources. Arlington, VA.: Council for 
Exceptional Children, 1969— date. 


Issued quarterly, this publication indexes and abstracts: books, peri- 
odicals, and government documents. 


Psychological Abstracts. Washington, D.C.: American Psychological 
Association, 1927— date. 


Issued bimonthly and indexed annually by subject and author, this 
publication has excellent signed summaries of psychological research re- 
ports. The December issue provides annual cumulative author and subject 
indexes. Beginning in 1963, each issue is also indexed by both subject and 
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author. Libraries may also provide a cumulative subject index (1927-- 1960) 
and a cumulative author index (1927—1963). 


Annual Review of Psychology. Palo Alto, CA.: Annual Reviews, 1950— 
date. 


Each issue of this annual volume contains critical reviews of the lit- 
erature in some 15 topical areas of contemporary psychology. Each review 
is written by a recognized authority on the topic. Although different authors 
writing in different years may vary considerably in their interpretation and 
handling of the same topic, all aim for comprehensive coverage of new 
developments. 


Psychological Bulletin. Washington, D.C.: American Psychological 
Association, 1904—date. 


Issued bimonthly, the Bulletin evaluates reviews of research literature 
and methodology. 


Sociological Abstracts. San Diego, CA.: Sociological Abstracts, Inc., 
1952—date. 


Issued five times a year and cumulated annually, the Abstracts cover 
all areas of sociology, including educational sociology. The work abstracts 
articles and presents book reviews from several hundred periodicals, both 


domestic and foreign. 


Social Work Research and Abstracts. New York: National Association 
of Social Workers, 1965—date. 


Published quarterly, this volume indexes by subject, title, and author. 
It combines published research with the previously published journal, Ab- 
stracts for Social Workers. 


National School Law Reporter. New London, CT.: Croft Educational 
Services, 1955— date. 


'The biweekly publication abstracts court decisions on school law. 


Research-oriented periodicals 


There are many publications in education and in closely related areas that 
report research activity. Some of these publications are exclusively research- 
oriented. Others present both research reports and.feature-type articles. 
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It is possible that beginning researchers may not be familiar with many of 
the specialized publications that deal with a problem area selected. Browsing 
through these periodicals provides an effective introduction to the field. 
It is also possible that the student may find recent and current reports that 
have not yet appeared in the appropriate index. 

The following list of periodicals may be helpful to those who are 
planning a research project. 


EDUCATION 


Administrative Science Quarterly 

Adolescence 

Adult Education. 

Adult Jewish Education 

Alberta Journal of Educational Research 

American Association of University Pro- 
fessors Bulletin 

American Behavioral Scientist 

American Biology Teacher 

American Education 

American Educational Research Journal 

American Vocational Journal 

Arbitration in the Schools 

Arithmetic Teacher 

Audio-Visual Communications Review 

Audio-Visual Language Journal 

Black Scholar 

Bulletin of the National Association of 
Secondary Schools Principals 

Business Education Forum 

Business Education Quarterly 

California Journal of Educational Re- 
search 

Catholic Educational Review 

Character Education Journal 

Child Care Quarterly 

Child Development 

Child Study Journal 

Child Welfare 

Children Today 

Childhood Education 

Civil Rights Digest 

Clearing House 

College Board Review 

Colorado Journal of Educational Re- 
search 

Community and Junior College Journal 

Comparative Education 

Comparative Education Review 


Computers and Education 

Continuing Education 

Convergence 

Education and Urban Society 

Educational Administration Quarterly 

Educational Forum 

Educational Leadership 

Educational Record 

Educational Researcher 

Educational Research Quarterly 

Educational Technology 

Elementary School Journal 

Evaluation Quarterly 

Harvard Educational Review 

High School Journal 

History of Education Quarterly 

Home Economics Research Journal 

Human Development 

Illinois School Research 

Independent School Bulletin 

Indian Historian 

Integrated Education 

International Journal of Aging and Hu- 
man Development 

International Journal of Educational Sci- 
ence 

Jewish Education 

Journal for Research in Mathematics Ed- 
ucation 

Journal for the Study of Religion 

Journal of Afro-American Issues 

Journal of Alcohol and Drug Education 

Journal of American Indian Education 

Journal ef Business Education 

Journal of Communication 

Journal of Computer-Based Instruction 

Journal of Creative Behavior 

Journal of Drug Education 

Journal of Educational Data Processing 


SOCIOLOGY 
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Journal of Educational Measurement 

Journal of Educational Research 

Journal of Educational Statistics 

Journal of Experimental Education 

Journal of Higher Education 

Journal of Home Economics 

Journal of Industrial Teacher Education 

Journal of Law and Education 

Journal of Legal Education 

Journal of Leisure Research 

Journal of Library Research 

Journal of Negro Education 

Journal of Religion 

Journal of Research and Development in 
Education 

Journal of Research in Mathematics Edu- 
cation 

Journal of Research in Music Education 

Journal of Research in Science Teaching 

Journal of Social Studies Research 

Journal of Teacher Education 

Junior College Education 

Junior College Journal 

Kappa Delta Pi Record 

Library Resources and Technical Services 

Library Quarterly 

Mathematics Teacher 

Measurement in Education 

Merrill Palmer Quarterly 

Microfilm Review 

Modern Language Journal 

Multivariate Behavioral Research 

National Business Education Quarterly 

National Catholic Educational Associa- 
tion Bulletin 

National Education Association Research 
Bulletin 

National Elementary Principal 

National Society for Programmed Instruc- 
tion Journal 


American Anthropologist 

American Behavioral Scientist 
American Journal of Sociology 
American Sociological Review 
Ethnology 

Federal Probation 

Human Relations 

Journal of American Indian Education 


Negro Educational Review 

New England Association Quarterly 
North Central Association Quarterly 
Outlook 

Peabody Journal of Education 

Phi Delia Kappan 

Phylon 

Pollution Abstracts 

Practical Application of Research 
Programmed Instruction 
Psychometrika 

Public Opinion Quarterly 

Religion Teachers Journal 
Religious Education 

Research in Higher Education 
Research in the Teaching of English 
Review of Educational Research 
Review of Religious Research 
School and Society 

School Law Journal 

School Law Reporter 

School Review 

School Science and Mathematics 
Science 

Science Education 

Science and Children 

Science Teacher 

Social Education 

Social Science Research 

Speech Monographs 

Speech Teacher 

Teachers College Record 

Theory and Research in Social Education 
Theory into Practice 

Times Educational Supplement 
UCLA Educator 

Visual Education 

Young Children 


Journal of Applied Behavioral Science 
Journal of Correctional Education 
Journal of Educational Sociology 
Journal of Experimental Social Psychology 
Journal of Marriage and the Family 
Journal of Research in Crime and Delin- 


quency | 
Rural Sociology 
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Social Behavior and Personality 
Social Case Work 

Social Education 

Social Forces 

Social Problems 

Social Psychology 

Social Work 

Sociological Methods and Research 


PSYCHOLOGY 


American Journal of Orthopsychiatry 
American Journal of Psychiatry 

American Journal of Psychology 
American Psychologist 

Applied Psychological Measurement 
Behavioral Disorders 

British Journal of Educational Psychology 
British Journal of Psychology 

Catholic Psychological Record 

Cognitive Psychology 

Contemporary Educational Psychology 
EA T and Psychological Measure- 


Gone Psychology Monographs 
Journal of Abnormal Psychology 
Journal of Applied Psychology 
Journal of Autism sn Childhood Schizo- 
phrenia 
Journal of Clinical Psychology 
Jouet f of Comparative and Physiological 
Psychology 
Journal of Consulting and Clinical 
Psychology 
Journal of Counseling Psychology 
Journal of Creative Behavior 
Journal of Educational Psychology 
Journal of Experimental Child Psychology 
Journal of General Psychology 
Journal of Genetic Psychology 


HEALTH AND PHYSICAL EDUCATION 


American Journal of Nursing 

American Journal of Occupational Therapy 
American Journal of Physical Medicine 
American Journal of Public Health 
Athletic Journal 

Health and Education Journal 

Health Education 


Sociological Record 

Sociology of Education 
Sociology and Social Research 
Sociometry 

Teaching Sociology 

Urban Education 

Urban Review 


Journal of Humanistic Psychology 
Journal of Mental and Nervous Disease 
Journal of Personality 

Journal of Personality and Social Psychol- 


ogy 
Journal of Personal Assessment 
Journal of Psychiatric Research 
Journal of Psychology 
Journal of Research in Personality 
Journal of School Psychology 
Journal of Social Psychology 
Journal of Verbal Learning and Behavior 
Learning and Motivation 
Mental Hygiene 
Pastoral Psychology 
Perceptual and Motor Skills 
Personnel Psychology 
Psychiatry 
Psychoanalytic Quarterly 
Psychological Abstracts 
Psychological Bulletin 
Psychological Monographs 
Psychological Record 
Psychological Reports 
Psychological Review 
Psychology in the Schools 
Psychology of Women Quarterly 
Small Group Behavior 
Transactional Analysis Journal 


Journal of the American Dietetic Associa- 
tion 

Journal of the American Medical Associa- 
tion 

Journal of the American Physical Therapy 
Association 

Journal of Clinical Nutrition 
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Journal of Continuing Educatign in Nurs- 
ing 

Journal of Drug Education 

Journal of Health and Social Behavior 

Journal of Health, Physical Education ahd 
Recreation 

Journal of Medical Education 

Journal of Mental Health 

Journal of Nursing Education 

Journal of Nutrition 

Journal of Pediatrics 

Journal of Rehabilitation 


GUIDANCE AND COUNSELING 


American Vocational Journal 

British Journal of Guidance and Counsel- 
mg 

California Personnel and Guidance Asso- 
ciation Journal 

Canadian Counsellor 

Counselor Education and Supervision 
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